Re: Simple SQL Puzzle

From: Brian Tkatch <SPAMBLOCK.Maxwell_Smart_at_ThePentagon.com.SPAMBLOCK>
Date: Tue, 30 Oct 2001 11:48:47 GMT
Message-ID: <3bde91e2.1631223765@news.alt.net>

How about this.

Make a third table with the same structure and INSERT INTO New_Table SELECT * FROM big_table. Then join the new table and the small table for a delete.

DELETE FROM New_Table WHERE EXISTS
(SELECT * FROM small_table WHERE myfield = New_Table.myfield);

Doing a DISTINCT on such a large tale, is going to take a long time. IIRC (and also understood it when I read it), it SELECTS all the records and sorts them (same as with an ORDER BY). Depending on the memory allocated, this may go to disk, which is a thousand times slower.

I just did a COUNT(DISTINCT) on such a table (> 1 million records), but with the help of some WHERE clauses I got it toned down to run in less than a minute.

Brian

On 30 Oct 2001 04:17:14 -0800, chrisjbrady_at_yahoo.com (Chris Brady) wrote:

>Niall - many thanks for responding so quickly. In fact the reason I
>asked is that I need to extract data from one table (table1) that is
>not in table2. Both tables have identical structure but the contents
>of table2 is a subset of table1.
>
>However I had a suspicion that this didn't work as expected, hence the
>original question:
>
>SQL> select count(distinct(i.myfield)) from table1 i, table2 w
> where i.myfield <> w.myfield;
>
>Instead I've used:
>
>SQL> select distinct(i.myfield) from table1 i
> where i.myfield not in
> (select distinct(w.myfield) from table2 w);
>
>The only problem with half a million records in table1 and a few
>thousand in table 2 is that this take three hours to run!! I think
>that this does the second select for every record from the first
>select!!
>
>By-the-way I'm a newbie to Oracle and SQL as you might have realised!!
>
>Chris B.
>
>"Niall Litchfield" <n-litchfield_at_audit-commission.gov.uk> wrote in message news:<3bdd3db5$0$225$ed9e5944_at_reading.news.pipex.net>...
>> Comments below
>> "Chris Brady" <chrisjbrady_at_yahoo.com> wrote in message
>> news:51604466.0110290314.226d2546_at_posting.google.com...
>> > Please can someone explain this? Its the same table compared with
>> > itself. The first case I can understand but why is the second case
>> > giving the same total? I would have thought that it would be zero.
>> <snip>
>> > SQL> select count(distinct(i.myfield)) from temp_data i, temp_data w
>> > where i.myfield <> w.myfield;
>> >
>> > COUNT(DISTINCT(I.MYFIELD))
>> > --------------------------
>> > 47548
>>
>> Your table has 47548 distinct values of myfield. Each of those is clearly
>> not equal to the other 47547 values in the table. Thus each value of myfield
>> will satisfy your where clause at least 47547 times. Assuming myfield is not
>> in fact unique try taking the distinct keyword out of the two pieces of sql
>> to see the results more clearly.
Received on Tue Oct 30 2001 - 05:48:47 CST