Home » SQL & PL/SQL » SQL & PL/SQL » Telephone problem
Telephone problem [message #212926] Mon, 08 January 2007 19:22 Go to next message
aaapittsburgh
Messages: 4
Registered: January 2007
Junior Member
Hi,

The following is a problem I need help with and I am willing to pay for help if necessary. Any info would be greatly appreciated.

Two tables in a database:

Table1 contains a list of phone numbers
Table2 contains a list of phone numbers as well

I would like to create a Table3, in which Table3 contains all numbers
from Table1 that is not in Table2. I am looking for the shortest
runtime possible, keeping in mind that you can use whatever method(s)
you deem necessary.

Table1 contains 30 Million rows,

Table2 contains 2000 rows.

given a regular SQL expression, it will yield Big O(m*n)

Where m = rowcount of Table1
and n = rowcount of Table2

Generate for me, a method in which, runtime will yield Big O (m log2
n).

I don't need code, I want to hear your logic. Table1 is customers, Table2 is a list of prepaid phone numbers. Table3 is list of people to bill.
Re: Telephone problem [message #212927 is a reply to message #212926] Mon, 08 January 2007 19:26 Go to previous messageGo to next message
BlackSwan
Messages: 25040
Registered: January 2009
Location: SoCal
Senior Member
select phone_no from table1
minus
select phone_no from table2;
Re: Telephone problem [message #212931 is a reply to message #212927] Mon, 08 January 2007 19:46 Go to previous messageGo to next message
aaapittsburgh
Messages: 4
Registered: January 2007
Junior Member
anacedent wrote on Mon, 08 January 2007 19:26
select phone_no from table1
minus
select phone_no from table2;


I thought this was the answer as well, but I guess I over analyzed it and thought it was too simplistic. Can anyone else confirm this answer?
Re: Telephone problem [message #212933 is a reply to message #212926] Mon, 08 January 2007 19:50 Go to previous messageGo to next message
BlackSwan
Messages: 25040
Registered: January 2009
Location: SoCal
Senior Member
It appears somebody does NOT believe in 3rd Normal Form.
IMO, the two tables should be a single table with a PRE_PAID column flag
Re: Telephone problem [message #212934 is a reply to message #212931] Mon, 08 January 2007 19:58 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The MINUS will yeild a result i the order of MlogM + NlogN, beacuse both tables are sorted.

SELECT phone
FROM table1
WHERE NOT EXISTS (
  SELECT 1
  FROM   table2
  WHERE  table2.phone = table1.phone)


The above will be of the order of Mlog2N if table2.phone is indexed. It will perform M scans of the index - each index scan will be of the order of log2N.

This is very inexact, because the base of the LOG is wrong. You would need to replace it with the number of index entries per block. This is also ignoring the cost of reading table1, and the cost of writing table3.

Ross Leishman
Re: Telephone problem [message #212951 is a reply to message #212927] Mon, 08 January 2007 22:39 Go to previous messageGo to next message
aaapittsburgh
Messages: 4
Registered: January 2007
Junior Member
anacedent wrote on Mon, 08 January 2007 19:26
select phone_no from table1
minus
select phone_no from table2;


The guy who wrote this problem just told me that this answer is incorrect.

His response to me was:

it's not too simplistic, but it is incorrect. This will still
yield
a big O(m*n).

Give it one more try, you are thinking too much in terms of DB..

Ask yourself, what are the only structures that would yield
BigO(nlog2n)? Answer that, and you will get your answer..


Any ideas?
Re: Telephone problem [message #212952 is a reply to message #212926] Mon, 08 January 2007 22:51 Go to previous messageGo to next message
BlackSwan
Messages: 25040
Registered: January 2009
Location: SoCal
Senior Member
The CORRECT answer can be obtained with a single FTS of Table2 & then a single FTS of Table1.
If the PHONE_NO of Table1 record matches any read from Table2, it is NOT included as part of the solution, otherwise it is.
IMO, the MINUS is O(Table1+Table2)
Re: Telephone problem [message #213226 is a reply to message #212952] Tue, 09 January 2007 20:30 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
And what is the cost of the sorts used by the MINUS operation? Assuming Oracle uses the QuickSort algorithm, a QuickSort performs in the order of N(log(N)) operations, where N is the number of rows to be sorted. Making the total O(MlogM + NlogN).

Go back and look over my earlier post.

Ross Leishman
Re: Telephone problem [message #213468 is a reply to message #212926] Wed, 10 January 2007 21:27 Go to previous messageGo to next message
BlackSwan
Messages: 25040
Registered: January 2009
Location: SoCal
Senior Member
When I recently implemented a similar solution in PERL, the smaller table was stored in a HASH data type & NO sorting was required.
Re: Telephone problem [message #213504 is a reply to message #213468] Thu, 11 January 2007 00:41 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
You could get the same algorithm achieved in Oracle with the NOT EXISTS if TABLE2 was implemented as a Hash Cluster.

Alternatively, by transforming it to a NOT IN sub-query, you could do it the same way without the hash cluster. This would give you O(m+n), but I didn't mention it earlier because the OP specifically requested O(Mlog2N).

Ross Leishman
Re: Telephone problem [message #213546 is a reply to message #213504] Thu, 11 January 2007 04:01 Go to previous messageGo to next message
William Robertson
Messages: 1640
Registered: August 2003
Location: London, UK
Senior Member
I'm puzzled - is the question about finding an efficient solution to a real business problem, or is it a puzzle to do with writing a query that will "yield" some theoretical metric? I don't get it.

If it's supposed to be a real problem, then it is pretty absurd to say that the perfecly sound MINUS approach is incorrect on the grounds that it doesn't yield BigO(nlog2n), whatever the hell that is supposed to mean. If it's some kind of mathematical puzzle, then count me out.
Re: Telephone problem [message #213682 is a reply to message #213546] Thu, 11 January 2007 19:12 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I don't think its a real-world problem. The purpose seems to be to get the student to THINK about how Oracle evaluates a query, and therefore how that query will scale.

The point about MINUS is that it is very cometitive with other methods at small-medium volumes, because all of the SORT actions are performed in memory; it just doesn't scale as well as the big+small hash join or hash semi join.

Count me in, because I'm always looking for ways to make programmers think about scalability.

Ross Leishman
Re: Telephone problem [message #213683 is a reply to message #213546] Thu, 11 January 2007 19:20 Go to previous messageGo to next message
aaapittsburgh
Messages: 4
Registered: January 2007
Junior Member
It IS a real world problem. The person who asked me to solve this worked for a telephone company and this was an actual problem he had to solve. As far as the answer goes, rleishman got it right in the 5th post of this thread.

However, the person who told me the problem now admits that there were actually a few ways to solve it: QuickSort, Red Black sort, and heap sort. They all allow you to break it down to O(mlog2n).
Re: Telephone problem [message #213685 is a reply to message #213683] Thu, 11 January 2007 19:23 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
In that case, tell your friend to try:
SELECT phone
FROM table1
WHERE table1.phone NOT IN (
  SELECT phone
  FROM   table2
)


It will give O(m+n)

Ross Leishman
Re: Telephone problem [message #213797 is a reply to message #213683] Fri, 12 January 2007 06:05 Go to previous messageGo to next message
William Robertson
Messages: 1640
Registered: August 2003
Location: London, UK
Senior Member
If it's a real-world Oracle problem, why is the person who asked the question not interested in actual Oracle performance metrics such as IO as reported by autotrace, session trace etc, let alone actual response time? The table volumes, database resources and optimizer-related settings will also make a difference, as will the presence of indexes and the use of different structures such as index-organized tables, partitions and clusters. Yet none of this is apparently as relevant to the questioner as whether or not it will yield a big O (I so thought that meant something else), and the desired answer turns out to be either QuickSort, Red Black Sort or Heap Sort. I would be very interested in seeing the page in the Oracle documentation where those options are described, because I couldn't find it.

That's an interesting point about the set operators. We could also discuss how IN and EXISTS tend to be internally rewritten to the same hash join, and how 10g can drive that hash join from the smaller table, but what would be the point?

Out of interest, what is O(mlog2n)?
Re: Telephone problem [message #213934 is a reply to message #213797] Fri, 12 January 2007 17:55 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I took it to mean O = Order of Magnitude. Mlog2N is M multiplied by the base-2 log of N. Although, as I pointed out earlier, 2 is the wrong base for an index.

The whole point of orders of magnitude is in determining scalability. As M increases, what happens to the query runtime. O(Mlog2N) tells us that if N stays constant, we can increase the size of table M and achieve linear scalability. This is something that indexing, partitioning, and clustering will not tell us.

In my opinion, the fact that scalability is hardly addressed in the Performance Tuning manual is a big hole.


Ross Leishman
Re: Telephone problem [message #213972 is a reply to message #212926] Sat, 13 January 2007 03:25 Go to previous messageGo to next message
Rehan Mirza
Messages: 63
Registered: July 2004
Location: Pakistan
Member

Select T1.PhoneNO
From Table1 T1, Table2 T2
where T1.PhoneNo = T2.PhoneNo (+)
and (T2.PhoneNo is Null)



Try this Query and reply me
Re: Telephone problem [message #214008 is a reply to message #213972] Sat, 13 January 2007 09:00 Go to previous messageGo to next message
William Robertson
Messages: 1640
Registered: August 2003
Location: London, UK
Senior Member
Rehan Mirza wrote on Sat, 13 January 2007 03:25
Select T1.PhoneNO
From Table1 T1, Table2 T2
where T1.PhoneNo = T2.PhoneNo (+)
and (T2.PhoneNo is Null)

Try this Query and reply me

The OP is not interested in trying anything to find actual execution times. For some reason he wants a query that in principle yields M multiplied by the base-2 log of N where M and N are the number of rows in table1 and table2 respectively, regardless of how well it performs in practice, and without "thinking too much in terms of DB". Presumably the idea is that this will automatically produce the most efficient approach. Not being a mathematician I have no idea what that means despite Ross' patient explanation. I have to say it is a novel approach to performance tuning and one I don't see working at all.

btw the outer join is not the same query unless both phone number columns are known to be unique. The most reliable way to ensure this would be to have unique constraints in place, which would imply indexes being present, and that affects the tuning options considerably, not to mention thinking in terms of DB which we are supposed to avoid.
Re: Telephone problem [message #214079 is a reply to message #214008] Sun, 14 January 2007 19:35 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The question was not well phrased in the first place, because it was asked with a PARTICULAR solution in mind. Perhaps a better way to ask it would be:

Alternate Phrasing of the question
TABLE1 and TABLE2, of M rows and N rows respectivey, each contain a column PHONE_NO, which is indexed on both tables but not necessarily unique.

In future, N will remain static, and M - currently 30,000,000 - will grow at a constant rate of 1M per year.

Provide a SQL that lists all of the rows from TABLE1 that have phone numbers not existing in TABLE2.

Performance wise, the solution should ensure linear increases in IO as M increases (or more precicely for mathematicians: "as M approaches infinity"), and should work for any arbitrary (but static) value of N.


Ross Leishman

Re: Telephone problem [message #214133 is a reply to message #214008] Mon, 15 January 2007 01:57 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
I believe the assumption is that regardless of the exectution speeds of the variaous queries on the current set of data, in the long run, as the size of the tables grows, the solution that scales best will become the best solution.

Re: Telephone problem [message #214174 is a reply to message #214133] Mon, 15 January 2007 04:52 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Where's Littlefoot when you need him with a smilie touching his nose with an index finger - a-la-charades
Re: Telephone problem [message #214183 is a reply to message #214174] Mon, 15 January 2007 05:47 Go to previous messageGo to next message
Littlefoot
Messages: 20895
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
I'm not sure will those be satisfactory, but here they are: this one is pointing ./fa/1985/0/, but this one (ha, ha) is also being more productive ./fa/1986/0/.
Re: Telephone problem [message #214326 is a reply to message #214183] Mon, 15 January 2007 20:01 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Dude, you are a freak, and I mean that in the nicest possible way.

I reckon the first one is actually "shhhhh-ing", not touching its nose, but its close enough for me. The second one..... well, the less said the better.
Re: Telephone problem [message #214433 is a reply to message #214326] Tue, 16 January 2007 07:57 Go to previous message
joy_division
Messages: 4640
Registered: February 2005
Location: East Coast USA
Senior Member
The first one to me looks like Dr. Evil from Austin Powers saying "One Millllllion dollars."
Previous Topic: simple query. PLEASE GIVE ANSWER
Next Topic: Which one is more effective?
Goto Forum:
  


Current Time: Tue Dec 06 00:26:09 CST 2016

Total time taken to generate the page: 0.11130 seconds