Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: not in VS not exists
A copy of this was sent to "R.W. Fairbairn" <richard.fairbairn_at_tesco.net>
(if that email address didn't require changing)
On Sun, 14 Nov 1999 10:27:00 -0000, you wrote:
>What is the difference between these two?
>
Logically, they operate differently. consider:
scott_at_8.0> create table t1 ( x int );
Table created.
scott_at_8.0> create table t2 ( y int );
Table created.
scott_at_8.0> insert into t1 values ( 1 ); scott_at_8.0> insert into t2 values ( 2 ); scott_at_8.0> insert into t2 values ( null );
scott_at_8.0> select * from t1 where x not in ( select y from t2 ); no rows selected
scott_at_8.0> select * from t1 where not exists ( select 1 from t2 where y=x );
X
1
The first one returns zero rows always (due to the inclusion of a NULL in the subquery). Tri-valued logic returns "unknown" when it asks "is x not in ( NULL )". therefore, no rows are returned.
In the second query -- it is asking a much more specific question -- Return a row such that there exists NO exact match in the other table.
So, logically they are very different. Physically they are different as well (how they get processed). Typically a NOT IN is very expensive to process as the subquery tends to get evaluated once per row from the outer query (with RBO this is the case, with the CBO -- it might not be but usually is). The entire set of values to 'not be in' is generated. You can see this with a query like:
tkyte_at_8.0> create table emp as select * from scott.emp; Table created.
tkyte_at_8.0> create index emp_sal_idx on emp(sal); Index created.
tkyte_at_8.0> set autotrace on
tkyte_at_8.0> select * from emp where sal not in ( select sal from emp ); no rows selected
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE 1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP' 3 1 TABLE ACCESS (FULL) OF 'EMP'
Statistics
0 recursive calls 45 db block gets 15 consistent gets
tkyte_at_8.0> select * from emp where not exists ( select 1 from emp b where b.sal = emp.sal );
no rows selected
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE 1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP' 3 1 INDEX (RANGE SCAN) OF 'EMP_SAL_IDX' (NON-UNIQUE)
Statistics
0 recursive calls 3 db block gets 3 consistent gets
The not exists -- using the index -- did less work here (3 db block gets vs 45). On a large subquery, or a subquery that is expensive to compute, this can be a very large performance impact.
If the logic permits it, i prefer to use NOT EXISTS when possible. Another alternative is an outer join. Using t1 and t2 again from above with different data we see:
tkyte_at_8.0> insert into t1 values (1); tkyte_at_8.0> insert into t1 values (2); tkyte_at_8.0> insert into t2 values (2); tkyte_at_8.0> insert into t2 values (null);
tkyte_at_8.0> select t1.* from t1, t2 where t1.x = t2.y(+) and t2.y is null;
X
1
tkyte_at_8.0> select * from t1 where x not in ( select y from t2 );
no rows selected
tkyte_at_8.0> select * from t1 where not exists ( select 1 from t2 where y=x );
X
1
An outer join works much like a NOT EXISTS logically. The NOT EXISTS may be more performant however since it stops evaluating the subquery as soon as it finds a row that does match whereas the join will evaulate all rows and then throw out matches.
--
See http://osi.oracle.com/~tkyte/ for my columns 'Digging-in to Oracle8i'...
Current article is "Part I of V, Autonomous Transactions" updated June 21'st
Thomas Kyte tkyte_at_us.oracle.com Oracle Service Industries Reston, VA USA
Opinions are mine and do not necessarily reflect those of Oracle Corporation Received on Sun Nov 14 1999 - 08:52:26 CST