Re: Why doesnt this query work?? Makes no sense????

From: <ddf_dba_at_my-deja.com>
Date: 2000/07/03
Message-ID: <8jqe83$nt2$1@nnrp1.deja.com>

In article <8jbha5$8eo$1_at_nnrp1.deja.com>, Thomas J. Kyte <tkyte_at_us.oracle.com> wrote:
> In article <3959413e.5520978_at_news.mindspring.com>,
> nitefrog_at_yahoo.com wrote:
> > Why does this query not return any rows? I cant figure this one
out?
> > Also I want to note that the tablespaces are full or very close to
> > being according to the storage manager. Will this cause the
problem?
> > Also this query takes for ever to run as well. Any help is greatly
> > appreciated.
> >
> > select patientid
> > from B
> > where patientid not in (select patientid from A);
> >
> > TABLE A - patientid
> > -------
> > 1
> > 2
> > 3
> > 4
> >
> > TABLE B - patientid
> > -------
> > 1
> > 2
> > 3
> > 4
> > 5
> > 6
> > 7
> >
> > Thanks,
> >
> > Kev.-
>
> you have a NULL patientid in table A. When you say "where x not in
> ( ....) " and the set contains a NULL -- it is "unknown" whether x is
> not in that set. Here is an example showing your issue and then the
> solutions are below. This also explains why its "slow". We are doing
> a full scan of table A for *every* row in B. We cannot use an index
as
> NULLS are not indexed in conventional b*trees. Once we use the
correct
> query and the CBO, it'll do much much better:
>
> ops$tkyte_at_8i> create table t1 ( x int, constraint t1_pk primary
key(x));
> Table created.
>
> ops$tkyte_at_8i> create table t2 ( x int, constraint t2_unique unique(x)
);
> Table created.
>
> ops$tkyte_at_8i> insert into t1 values ( 1 );
> 1 row created.
>
> ops$tkyte_at_8i> insert into t1 values ( 2 );
> 1 row created.
>
> ops$tkyte_at_8i> insert into t2 values ( 1 );
> 1 row created.
>
> ops$tkyte_at_8i> select * from t1 where x not in ( select x from t2 );
>
> X
> ----------
> 2
>
> There we get what we expect but as soon as we stick a single NULL in
> there:
>
> ops$tkyte_at_8i> insert into t2 values ( NULL );
> 1 row created.
>
> ops$tkyte_at_8i> select * from t1 where x not in ( select x from t2 );
> no rows selected
>
> That query will return 0 rows for ever.
>
> ops$tkyte_at_8i> set autotrace on explain
> ops$tkyte_at_8i> select * from t1 where x not in ( select x from t2 where
> x is NOT NULL );
>
> X
> ----------
> 2
>
> Execution Plan
> ----------------------------------------------------------
> 0 SELECT STATEMENT Optimizer=CHOOSE
> 1 0 FILTER
> 2 1 TABLE ACCESS (FULL) OF 'T1'
> 3 1 TABLE ACCESS (FULL) OF 'T2'
>
> That shows how the get the right answer, unfortunately it'll be slow,
> (full scan of t2 for every row in t1). So, lets flip over to CBO:
>
> ops$tkyte_at_8i> select /*+ first_rows */ * from t1 where x not in (
> select x from t2 where x is NOT NULL );
>
> X
> ----------
> 2
>
> Execution Plan
> ----------------------------------------------------------
> 0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=26 Card=5
> Bytes=130)
>
> 1 0 INDEX (FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=26 Card=5 Byte
> s=130)
>
> 2 1 INDEX (UNIQUE SCAN) OF 'T2_UNIQUE' (UNIQUE) (Cost=1 Card
> =1 Bytes=26)
>
> Much better -- it'll still full scan t1 (but using the index this time
-
> - great if there is lots of other columns in t1) and it'll do an index
> probe into t2 for each row in t1. You can also use a "hash anti-join"
> hint instead -- select /*+ HASH_AJ */ ..... to get a different,
> perhaps faster, plan
>
> Another way to do the above with the RBO is to use where exists.
It'll
> full scan t1 and index probe into t2 for each row:
>
> ops$tkyte_at_8i> select * from t1 where NOT EXISTS ( select null from t2
> where t2.x = t1.x );
>
> X
> ----------
> 2
>
> Execution Plan
> ----------------------------------------------------------
> 0 SELECT STATEMENT Optimizer=CHOOSE
> 1 0 FILTER
> 2 1 TABLE ACCESS (FULL) OF 'T1'
> 3 1 INDEX (UNIQUE SCAN) OF 'T2_UNIQUE' (UNIQUE)
>
> And for the last one:
>
> ops$tkyte_at_8i> select x from t1
> 2 minus
> 3 select x from t2
> 4 /
>
> X
> ----------
> 2
>
> Execution Plan
> ----------------------------------------------------------
> 0 SELECT STATEMENT Optimizer=CHOOSE
> 1 0 MINUS
> 2 1 SORT (UNIQUE)
> 3 2 TABLE ACCESS (FULL) OF 'T1'
> 4 1 SORT (UNIQUE)
> 5 4 TABLE ACCESS (FULL) OF 'T2'
>
> is yet another way to get the same result.
>
> ops$tkyte_at_8i> set autotrace off
> >
>
> --
> Thomas Kyte (tkyte_at_us.oracle.com) Oracle Service Industries
> Howtos and such: http://osi.oracle.com/~tkyte/index.html
> Oracle Magazine: http://www.oracle.com/oramag
> Opinions are mine and do not necessarily reflect those of Oracle Corp
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>

I don't know which version of Oracle you are running, Tom, but I cannot get Oracle to exhibit the same behavior on 8.1.5:

SQL> create table a(patientid int, constraint a_pk primary key(patientid));

Table created.

SQL> create table b(patientid int, constraint b_uq unique(patientid));

Table created.

Using SQL*Loader I populated both tables a and b; the results of the data loads are shown below:

SQL> select *
2 from a
3 /

PATIENTID

8 rows selected.

SQL> select *
2 from b
3 /

PATIENTID

17 rows selected.

Running the query in question:

SQL> select patientid
2 from b
3 where patientid not in (select patientid from a) 4 /

PATIENTID

9 rows selected.

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE 1 0 FILTER

   2    1     TABLE ACCESS (FULL) OF 'B'
   3    1     TABLE ACCESS (FULL) OF 'A'

Statistics

          0  recursive calls
         72  db block gets
         19  consistent gets
          0  physical reads
          0  redo size
       1389  bytes sent via SQL*Net to client
        708  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

Now I add a NULL to table b:

SQL> insert into b
  2> values
  3> (null);

1 row created;

and I run the query again:

SQL> select patientid
2 from b
3 where patientid not in (select patientid from a) 4 /

PATIENTID

9 rows selected.

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE 1 0 FILTER

   2    1     TABLE ACCESS (FULL) OF 'B'
   3    1     TABLE ACCESS (FULL) OF 'A'

Statistics

          0  recursive calls
         76  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
       1401  bytes sent via SQL*Net to client
        708  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

I experience no performance problems, and I do not get "no rows selected" as the result set.

If I select all rows from b this proves that a NULL value exists in the table:

SQL> select *
2 from b
3 /

PATIENTID

        12
        13
        14
        15
        16
        17
             <--- NULL value recently inserted

18 rows selected.

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (FULL) OF 'B' Statistics

          0  recursive calls
          4  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1833  bytes sent via SQL*Net to client
        762  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         18  rows processed

The NULL value does not appear to affect the query speed or the results as you state. Again, maybe you're on a different release of Oracle.

--
David Fitzjarrell
Oracle Certified DBA


Sent via Deja.com http://www.deja.com/
Before you buy.

Received on Mon Jul 03 2000 - 00:00:00 CDT