Re: Improve query performance

From: Daniel Morgan <damorgan_at_x.washington.edu>
Date: Wed, 10 Dec 2003 08:55:02 -0800
Message-ID: <1071075338.477755@yasure>

Anurag Varma wrote:

> "VC" <boston103_at_hotmail.com> wrote in message news:kWuBb.349711$ao4.1171224_at_attbi_s51...
>

>>Hello Anurag,
>>
>>"Anurag Varma" <avdbi_at_hotmail.com> wrote in message
>>news:DmuBb.5933$qe6.5310_at_news02.roc.ny...
>>
>>>A "not in" query might perform differently with RBO.
>>>With CBO and all tables analyzed, "NOT IN" is executed the same exact way
>>
>>as "NOT EXISTS".
>>
>>>I've tried it a lot of times and find this to be true 8i onwards.
>>>
>>>In fact Harrison notes this fact in this SQL Tuning book.
>>>Do you have a working example to prove this the other way?
>>
>>drop table t1;
>>drop table t2;
>>create table t1 as select * from all_objects;
>>create index t1_idx on t1(object_id);
>>create table t2 as select * from all_objects where rownum <= 500;
>>analyze table t1 compute statistics;
>>analyze table t1 compute statistics;
>>set autot trace explain
>>
>>select * from t2 where object_id not in (select object_id from t1);
>>
>>Execution Plan
>>----------------------------------------------------------
>> 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1 Bytes=132)
>> 1 0 HASH JOIN (ANTI) (Cost=12 Card=1 Bytes=132)
>> 2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=3 Card=899 Bytes=115072)
>> 3 1 INDEX (FAST FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=7
>>Card=25324 Bytes=101296)
>>
>>select * from t2 where not exists (select 1 from t1 where
>>object_id=t2.object_id);
>>
>>Execution Plan
>>----------------------------------------------------------
>> 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=48 Card=45 Bytes=5760)
>>
>> 1 0 FILTER
>> 2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=3 Card=45 Bytes=5760)
>> 3 1 INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card =1
>>Bytes=4)
>>
>>Rgds.
>>
>>
>>>Anurag
>>>
>>>
>>
>>

> 
> Well I'll admit that proves my statement incorrect.
> Although in most cases I've seen, the execution plans were identical.
> 
> In this case the not in is actually performing better and if the CBO was smarter
> it would/should have chosen the "not in" plan for the "not exists" query considering the
> join columns are "not null", which makes the two queries identical.
> 
> Anurag

In my testing the execution plans are often different but that doesn't change the fact that it is a myth. I have a test case I use with my students that contains the following three statements:

SELECT srvr_id
FROM servers
WHERE srvr_id IN (

SELECT srvr_id
FROM serv_inst);

SELECT srvr_id
FROM servers s
WHERE EXISTS (
    SELECT srvr_id
    FROM serv_inst i
    WHERE s.srvr_id = i.srvr_id);

SELECT DISTINCT srvr_id
FROM servers
WHERE srvr_id NOT IN (

    SELECT srvr_id
    FROM servers
    MINUS
    SELECT srvr_id
    FROM serv_inst);

All have different plans and I don't think anyone would want to guess which has the lower cost.

-- 
Daniel Morgan
http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp
http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp
damorgan_at_x.washington.edu
(replace 'x' with a 'u' to reply)

Received on Wed Dec 10 2003 - 10:55:02 CST