Re: Performance of (NOT) IN vs. (NOT) EXISTS

From: <tas02_at_my-deja.com>
Date: Tue, 31 Oct 2000 21:44:38 GMT
Message-ID: <8tneg3$fng$1@nnrp1.deja.com>

You will notice that the explain plan of the NOT IN version is different from the IN version in that there is no SORT (JOIN). Instead, what I believe is happening is that (in the NOT IN version) Oracle is doing a full table scan of the BUS_CLASSIF_REP table and then, FOR EVERY ROW RETURNED doing a full index scan of the SYS_C0031525 index on the CLASSIFICATION table. So, if BUS_CLASSIF_REP has a million rows and so does CLASSIFICATION, you are doing a million full index scans of the CLASSIFICATION index or a million, million reads.

I've never seen a good explanation of why this is, but if you do a trace and tkprof on this, I think you'll see that's what is happening.

I've always made it a rule NEVER to use NOT IN with a subquery. Instead, use NOT EXISTS or MINUS (which I believe shows the same on an explain plan).

Alternatively, although I've never tried this, you could try coaxing Oracle to do a hash anti-join in the NOT in subquery, something like:

SELECT *
FROM autogenerate.bus_classif_rep
WHERE fk_classification NOT IN (

 	SELECT /*+ HASH_AJ */ object_id
 	FROM autogenerate.classification);

Like I said, I've never tried this, but according to the Oracle Tuning Guide, it should make NOT IN run fast. If you try it, let us know how it works.

HTH. Tom Sullivan
ADI Consulting, Inc.

In article <wOjL5.262$7O3.8844_at_news1.online.no>, frank.andersen_at_oslo.online.no wrote:
> Hi,
>
> Below is the explain plan output from Toad running against Oracle
> 8.0.5. What is strange here is that the NOT IN has a cost equal to the
> EXISTS and NOT EXISTS but takes _a lot_ longer to execute. We are
> talking minutes against days here. I am still of the opinion that a
> NOT IN should not take more time than an IN as long as the column is
> indexed. Haven't tried the other suggestions in this thread yet, but
> maybe someone can tell me why NOT IN seems so inefficient?
>
> Thanks,
> Frank
>
> SELECT *
> FROM autogenerate.bus_classif_rep
> WHERE fk_classification IN (
> SELECT object_id
> FROM autogenerate.classification);
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=24423 Card=10666051
> Bytes=1589241599)
> MERGE JOIN (Cost=24423 Card=10666051 Bytes=1589241599)
> TABLE ACCESS (BY INDEX ROWID) OF BUS_CLASSIF_REP (Cost=826
> Card=10666051 Bytes=1226595865)
> INDEX (FULL SCAN) OF I_BUS_CLASSIF_REP3 (NON-UNIQUE) (Cost=26
> Card=10666051)
> SORT (JOIN) (Cost=12215 Card=1900396 Bytes=64613464)
> INDEX (FULL SCAN) OF SYS_C0031525 (UNIQUE) (Cost=11382
> Card=1900396 Bytes=64613464)
>
> SELECT *
> FROM autogenerate.bus_classif_rep
> WHERE fk_classification NOT IN (
> SELECT object_id
> FROM autogenerate.classification);
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=28665 Card=533303
> Bytes=61329845)
> FILTER
> TABLE ACCESS (FULL) OF BUS_CLASSIF_REP (Cost=28665 Card=533303
> Bytes=61329845)
> INDEX (FULL SCAN) OF SYS_C0031525 (UNIQUE) (Cost=11382 Card=95020
> Bytes=3230680)
>
> SELECT *
> FROM autogenerate.bus_classif_rep t1
> WHERE EXISTS (
> SELECT t2.object_id
> FROM autogenerate.classification t2
> WHERE t2.object_id = t1.fk_classification)
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=28665 Card=533303
> Bytes=61329845)
> FILTER
> TABLE ACCESS (FULL) OF BUS_CLASSIF_REP (Cost=28665 Card=533303
> Bytes=61329845)
> INDEX (UNIQUE SCAN) OF SYS_C0031525 (UNIQUE) (Cost=2 Card=1
> Bytes=34)
>
> SELECT *
> FROM autogenerate.bus_classif_rep t1
> WHERE NOT EXISTS (
> SELECT t2.object_id
> FROM autogenerate.classification t2
> WHERE t2.object_id = t1.fk_classification)
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=28665 Card=533303
> Bytes=61329845)
> FILTER
> TABLE ACCESS (FULL) OF BUS_CLASSIF_REP (Cost=28665 Card=533303
> Bytes=61329845)
> INDEX (UNIQUE SCAN) OF SYS_C0031525 (UNIQUE) (Cost=2 Card=1
> Bytes=34)
>
> >"Frank Andersen" <frank.andersen_at_oslo.online.no> wrote in message
> >news:hWZK5.2617$Yy1.49240_at_news1.online.no...
> >> Hi all,
> >>
> >> I have a query where I want to select records from (A) that exist
(and
> >> in a later query, don't exist) in another table (B) like:
> >>
> >> SELECT COUNT(*) FROM A
> >> WHERE OBJECTID IN (
> >> SELECT FK_OBJEKT_ID
> >> FROM B);
> >>
> >> This works fine and is actually faster than:
> >>
> >> SELECT COUNT(*) FROM A
> >> WHERE EXISTS (
> >> SELECT 1
> >> FROM B
> >> WHERE B.FK_OBJECTID = A.OBJECTID);
> >>
> >> A.OBJECT_ID is PK and I have an index on B.FK_OBJECT_ID.
> >>
> >> However, if I want the records from A that don't exist in B I run
into
> >> problems using NOT IN like:
> >>
> >> SELECT COUNT(*) FROM A
> >> WHERE OBJECTID NOT IN (
> >> SELECT FK_OBJEKT_ID
> >> FROM B);
> >>
> >> This statement takes forever to execute and the optimizer doesn't
seem
> >> to use a temporary, indexed collection of FK_OBJECTID values, but
> >> rather using a full scan for each record in A. If the collection of
> >> FK_OBJECTID's was ordered/indexed, it really should not take more
time
> >> to do a NOT IN than an IN. Btw, NOT EXISTS uses appr. the same
amount
> >> of time as EXISTS because it is based on the same join, but I would
> >> appreciate if someone could shed some light on this NOT IN issue
and
> >> its lack of performance.
> >>
> >> Cheers,
> >> Frank
> >>
> >>
>
>

--
Tom Sullivan
ADI Consulting, Inc.


Sent via Deja.com http://www.deja.com/
Before you buy.

Received on Tue Oct 31 2000 - 15:44:38 CST