Re: not in VS not exists

From: Thomas Kyte <tkyte_at_us.oracle.com>
Date: Sun, 14 Nov 1999 09:52:26 -0500
Message-ID: <6ccuOHl1o2azn3QBxxLasW4Ibdiz@4ax.com>

A copy of this was sent to "R.W. Fairbairn" <richard.fairbairn_at_tesco.net> (if that email address didn't require changing) On Sun, 14 Nov 1999 10:27:00 -0000, you wrote:

>What is the difference between these two?
>

Logically, they operate differently. consider:

scott_at_8.0> create table t1 ( x int );
Table created.

scott_at_8.0> create table t2 ( y int );
Table created.

scott_at_8.0> insert into t1 values ( 1 );
scott_at_8.0> insert into t2 values ( 2 );
scott_at_8.0> insert into t2 values ( null );

scott_at_8.0> select * from t1 where x not in ( select y from t2 ); no rows selected

scott_at_8.0> select * from t1 where not exists ( select 1 from t2 where y=x );

The first one returns zero rows always (due to the inclusion of a NULL in the subquery). Tri-valued logic returns "unknown" when it asks "is x not in ( NULL )". therefore, no rows are returned.

In the second query -- it is asking a much more specific question -- Return a row such that there exists NO exact match in the other table.

So, logically they are very different. Physically they are different as well (how they get processed). Typically a NOT IN is very expensive to process as the subquery tends to get evaluated once per row from the outer query (with RBO this is the case, with the CBO -- it might not be but usually is). The entire set of values to 'not be in' is generated. You can see this with a query like:

tkyte_at_8.0> create table emp as select * from scott.emp; Table created.

tkyte_at_8.0> create index emp_sal_idx on emp(sal); Index created.

tkyte_at_8.0> set autotrace on

tkyte_at_8.0> select * from emp where sal not in ( select sal from emp ); no rows selected

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE 1 0 FILTER

   2    1     TABLE ACCESS (FULL) OF 'EMP'
   3    1     TABLE ACCESS (FULL) OF 'EMP'

Statistics

          0  recursive calls
         45  db block gets
         15  consistent gets

tkyte_at_8.0> select * from emp where not exists ( select 1 from emp b where b.sal = emp.sal );

no rows selected

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE 1 0 FILTER

   2    1     TABLE ACCESS (FULL) OF 'EMP'
   3    1     INDEX (RANGE SCAN) OF 'EMP_SAL_IDX' (NON-UNIQUE)

Statistics

          0  recursive calls
          3  db block gets
          3  consistent gets

The not exists -- using the index -- did less work here (3 db block gets vs 45). On a large subquery, or a subquery that is expensive to compute, this can be a very large performance impact.

If the logic permits it, i prefer to use NOT EXISTS when possible. Another alternative is an outer join. Using t1 and t2 again from above with different data we see:

tkyte_at_8.0> insert into t1 values (1);
tkyte_at_8.0> insert into t1 values (2);
tkyte_at_8.0> insert into t2 values (2);
tkyte_at_8.0> insert into t2 values (null);

tkyte_at_8.0> select t1.* from t1, t2 where t1.x = t2.y(+) and t2.y is null;

tkyte_at_8.0> select * from t1 where x not in ( select y from t2 );

no rows selected

tkyte_at_8.0> select * from t1 where not exists ( select 1 from t2 where y=x );

An outer join works much like a NOT EXISTS logically. The NOT EXISTS may be more performant however since it stops evaluating the subquery as soon as it finds a row that does match whereas the join will evaulate all rows and then throw out matches.

--
See http://osi.oracle.com/~tkyte/ for my columns 'Digging-in to Oracle8i'... Current article is "Part I of V, Autonomous Transactions" updated June 21'st

Thomas Kyte                   tkyte_at_us.oracle.com
Oracle Service Industries     Reston, VA   USA

Opinions are mine and do not necessarily reflect those of Oracle Corporation Received on Sun Nov 14 1999 - 08:52:26 CST