Re: Why does NOT IT take so long

From: Thomas J Kyte <tkyte_at_us.oracle.com>
Date: 1995/07/28
Message-ID: <3vaq0m$h9k_at_inet-nntp-gw-1.us.oracle.com>


dharris_at_uk.oracle.com (Dave Harris) wrote:

>snowden_at_haven.ios.com (Roger Snowden) wrote:
 

>>Jim Smith (Jim_at_jimsmith.demon.co.uk) wrote:
>>: In article <3uu80q$2nj_at_explorer.csc.com> tconder_at_csc.com "Tom Conder" writes:
>>: > When writing Oracle SQL, why does the NOT IN statement take so long?
 

>>: ... where this_id not in
>>: (select other_id from other table...)
 

>>I wonder if you could clarify this for me. I have a similar situation
>>where the sub-query takes less than two seconds in sqlplus to get 250
>>rows. The main table has about 100,000 rows. Does the subquery actually
>>execute for each row of the main table? For example:
 

>>select count(*) from dbown.donor where
>>dbown.donor.donorid not in (
>>select * from dbown.flagview)
 

>>dbown.flagview only has 250 rows. Does this not get chached and compared
>>to dbown.donor?
 

>>Roger
 

>the example:
 

>select count(*) from dbown.donor where
>dbown.donor.donorid not in (
>select * from dbown.flagview)
 

>this sql statement will cause 2 full table scans.
>the sql will be executed as follows:
 

>for every dbown.donor row perform a full table scan on
>dbown.flagview
>
>try using:
 

>select count(*)
>from dbown.donor d
>where not exists
> (select 1
> from dbown.flagview f
> where d.donorid = f.donorid)
 

>if the donorid is indexed on the dbown.flagview table then a full
>table scan can be avoided.
 

>Dave Harris

Perhaps the easiest ways to diagnose these types of issues is to use the TKPROF utility. For example, I created two tables, T1 and T2. Filled T1 with 27 rows and T2 with 1800 rows. Then I ran the following queries in sqlplus after issueing "alter session set sql_trace=true". Once you run tkprof on the trace file, you get a printout that shows the trace output at the bottom of this note.

All three queries I ran are "identical" in that they ask the same question. They are very different performance wise. As a rule of thumb for "not in" type of questions:

  • if the table in the subquery is SMALL and the table on the outside is LARGE, not in may be very efficient since the entire table will get cached and subsequent scans will be of little cost
  • if the table in the subquery is LARGE and the table on the outside is SMALL, not exists will be the best bet ASSUMING the columns you are "not in"ing on are indexed in the subquery table
  • if the table in the subquery is LARGE and the table on the outside is LARGE, an outer join with a check for NULL on the 'inner' table may be best, ASSUMING again the columns you are "not in"ing on are indexed.
  • if both tables are SMALL, who cares, it runs fast any which way you do it.

select *
from
 t1, t2 where t1.x = t2.x (+) and t2.x is null

call count cpu elapsed disk query current rows -------- ------- -------- --------- -------- -------- ------- ----------

Parse           1      0.03       0.03        0        0       0           0
Execute         1      0.00       0.00        0        0       0           0
Fetch           1      0.01       0.02        0       82       3          10

--------  -------  --------  --------- -------- -------- -------  ----------
total           3      0.04       0.05        0       82       3          10

Misses in library cache during parse: 1
Optimizer hint: CHOOSE
Parsing user id: 144 (TKYTE)

Rows Execution Plan

-------  ---------------------------------------------------
      0  SELECT STATEMENT   OPTIMIZER HINT: CHOOSE
     27    FILTER
     17      NESTED LOOPS (OUTER)
     27        TABLE ACCESS (FULL) OF 'T1'
     27        INDEX (UNIQUE SCAN) OF 'SYS_C003174' (UNIQUE)




******************************************************************************


******************************************************************************

select *
from
 t1 where x not in ( select x from t2 )

call count cpu elapsed disk query current rows -------- ------- -------- --------- -------- -------- ------- ----------

Parse           1      0.01       0.01        0        0       0           0
Execute         1      0.00       0.00        0        0       0           0
Fetch           1      0.78       0.78        0      214      84          10

--------  -------  --------  --------- -------- -------- -------  ----------
total           3      0.79       0.79        0      214      84          10

Misses in library cache during parse: 0
Optimizer hint: CHOOSE
Parsing user id: 144 (TKYTE)

Rows Execution Plan

-------  ---------------------------------------------------
      0  SELECT STATEMENT   OPTIMIZER HINT: CHOOSE
     27    FILTER
     27      TABLE ACCESS (FULL) OF 'T1'
  23911      TABLE ACCESS (FULL) OF 'T2'




******************************************************************************


******************************************************************************

select *
from
 t1 where not exists ( select null from t2 where t2.x = t1.x )

call count cpu elapsed disk query current rows -------- ------- -------- --------- -------- -------- ------- ----------

Parse           1      0.01       0.07        0        0       0           0
Execute         1      0.00       0.00        0        0       0           0
Fetch           1      0.02       0.09        0       82       3          10

--------  -------  --------  --------- -------- -------- -------  ----------
total           3      0.03       0.16        0       82       3          10

Misses in library cache during parse: 0
Optimizer hint: CHOOSE
Parsing user id: 144 (TKYTE)

Rows Execution Plan

-------  ---------------------------------------------------
      0  SELECT STATEMENT   OPTIMIZER HINT: CHOOSE
     27    FILTER
     27      TABLE ACCESS (FULL) OF 'T1'
     27      INDEX (UNIQUE SCAN) OF 'SYS_C003174' (UNIQUE)




******************************************************************************


Thomas Kyte
tkyte_at_us.oracle.com
Oracle Government Received on Fri Jul 28 1995 - 00:00:00 CEST

Original text of this message