Re: Simple SQL Puzzle

From: Sybrand Bakker <postbus_at_sybrandb.demon.nl>
Date: Tue, 30 Oct 2001 13:30:28 +0100
Message-ID: <ttt75ckiqna789@corp.supernews.com>

First of all
You should remove the distinct keyword from your subquery in () is denoting a set, and all elements of a set are unique by *formal definition*.
In your current situation Oracle performs the same operation two times, the optimizer is too stupid to recognize your distinct is redundant. It will perform a sort unique for you, and your in will also perform a sort unique for you.

That said, assuming you use 7.3.3 or higher, if you have 'myfield' indexed and the tables analyzed, the optimizer will run automatically run a index fast full scan for your subquery. This operation just sequentially scans the index.
Currently you most likely are using a full table scan which results in disastrous performance

A different shortcut, usually saving much time, although it may not apply in your case, see the recent Thomas Kyte post on this subject, is using minus (in conjuction with the index earlier mentioned) instead of not in.

But first retry the operation without the redundant distinct

Hth

--
Sybrand Bakker
Senior Oracle DBA

to reply remove '-verwijderdit' from my e-mail address


"Chris Brady" <chrisjbrady_at_yahoo.com> wrote in message
news:51604466.0110300417.6f1f05d9_at_posting.google.com...


> Niall - many thanks for responding so quickly. In fact the reason I

> asked is that I need to extract data from one table (table1) that is

> not in table2. Both tables have identical structure but the contents

> of table2 is a subset of table1.

>

> However I had a suspicion that this didn't work as expected, hence the

> original question:

>

> SQL> select count(distinct(i.myfield)) from table1 i, table2 w

>      where i.myfield <> w.myfield;

>

> Instead I've used:

>

> SQL> select distinct(i.myfield) from table1 i

>      where i.myfield not in

>      (select distinct(w.myfield) from table2 w);

>

> The only problem with half a million records in table1 and a few

> thousand in table 2 is that this take three hours to run!! I think

> that this does the second select for every record from the first

> select!!

>

> By-the-way I'm a newbie to Oracle and SQL as you might have realised!!

>

> Chris B.

>

> "Niall Litchfield" <n-litchfield_at_audit-commission.gov.uk> wrote in message

news:<3bdd3db5$0$225$ed9e5944_at_reading.news.pipex.net>...


> > Comments below

> > "Chris Brady" <chrisjbrady_at_yahoo.com> wrote in message

> > news:51604466.0110290314.226d2546_at_posting.google.com...

> > > Please can someone explain this? Its the same table compared with

> > > itself. The first case I can understand but why is the second case

> > > giving the same total? I would have thought that it would be zero.

> >  <snip>

> > > SQL> select count(distinct(i.myfield)) from temp_data i, temp_data w

> > >      where i.myfield <> w.myfield;

> > >

> > > COUNT(DISTINCT(I.MYFIELD))

> > > --------------------------

> > >                      47548

> >

> > Your table has 47548 distinct values of myfield. Each of those is

clearly


> > not equal to the other 47547 values in the table. Thus each value of

myfield


> > will satisfy your where clause at least 47547 times. Assuming myfield is

not


> > in fact unique try taking the distinct keyword out of the two pieces of

sql


> > to see the results more clearly.

Received on Tue Oct 30 2001 - 06:30:28 CST