Re: Why does this query do this?

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Sat, 12 Jul 2008 07:14:12 -0700 (PDT)
Message-ID: <5324b7a2-5a50-4cb2-acae-fc8c11aeaead@2g2000hsn.googlegroups.com>


On Jul 11, 10:51 pm, Mtek <m..._at_mtekusa.com> wrote:
> Ok, I want to get a list of items in TABLE A which do not exist in
> TABLE B.  Simple right?
>
> SELECT id FROM table_a WHERE id NOT IN (SELECT id FROM table_b);
>
> Guess what?  It does not work.  I get nothing.  If I use IN, it
> returns things, but if I use NOT IN I get nothing However, if I do
> this:
>
> SELECT id FROM table_a WHERE id NOT IN (SELECT id FROM table_b WHERE
> id IS NOT NULL);
>
> Well, it works fine.  Why do I need a NOT NULL on my subquery???
>
> Thanks,
>
> John.

An interesting problem that is possibly answered by looking at the access predicates in DBMS_Xplan output for the different queries. The following tests were performed on Oracle 11.1.0.6, which has a null aware anti-join, so some of the plans may look a little different on your database version:
First, the set up:
CREATE TABLE T1 (C1 NUMBER(10));
CREATE TABLE T2 (C1 NUMBER(10));

INSERT INTO T1 VALUES (2468);
INSERT INTO T1 VALUES (3690);
INSERT INTO T1 VALUES (1357);

INSERT INTO T2 VALUES (3690);
INSERT INTO T2 VALUES (1234);
INSERT INTO T2 VALUES (NULL);
INSERT INTO T2 VALUES (1357);

INSERT INTO T1 VALUES (NULL);
INSERT INTO T1 VALUES (8900); COMMIT; The first SQL statement:
SELECT
  C1
FROM
  T1
WHERE
  C1 NOT IN (
    SELECT
      C1
    FROM
      T2);

no rows selected

The DBMS_Xplan for the above query (right-most columns removed to prevent line wrapping):



| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time |

|* 1 | HASH JOIN ANTI NA | | 1 | 1 | 0 | 00:00:00.01 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 4 | 3 | 00:00:00.01 |

Predicate Information (identified by operation id):


   1 - access("C1"="C1")

An interesting access predicate. I wonder what happens when T1.C1 containing 2468 is compared with T2.C1 containing NULL, would the comparison be true, false, or unknown? Would the anti of the comparison be false, true, or known (respectively)?

Trying the same query again with a /*+ NO_QUERY_TRANSFORMATION */ hint



| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time |

|* 1 | FILTER | | 1 | | 0 | 00:00:00.01 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
|* 3 | TABLE ACCESS FULL| T2 | 5 | 1 | 5 | 00:00:00.01 |

Predicate Information (identified by operation id):


   1 - filter( IS NULL)
   3 - filter(LNNVL("C1"<>:B1))

Step 3 has another interesting access predicate, but the access predicate for step 1 is also interesting. What happens when T1.C1 is compared to see if it is not equal to a NULL in T2.C1?

With the NOT NULL added in the subquery: SELECT
  C1
FROM
  T1
WHERE
  C1 NOT IN (
    SELECT
      C1
    FROM
      T2
    WHERE C1 IS NOT NULL);         C1


      2468
      8900

---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-
Time |

|* 1 | HASH JOIN ANTI SNA| | 1 | 1 | 2 | 00:00:00.01 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
|* 3 | TABLE ACCESS FULL| T2 | 1 | 3 | 3 | 00:00:00.01 |

Predicate Information (identified by operation id):


   1 - access("C1"="C1")
   3 - filter("C1" IS NOT NULL)

With a /*+ NO_QUERY_TRANSFORMATION */ hint



| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time |

|* 1 | FILTER | | 1 | | 2 | 00:00:00.01 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
|* 3 | TABLE ACCESS FULL| T2 | 5 | 1 | 3 | 00:00:00.01 |

Predicate Information (identified by operation id):


   1 - filter( IS NULL)
   3 - filter(("C1" IS NOT NULL AND LNNVL("C1"<>:B1)))

A common variation to avoid the common performance hit of a subquery in older releases of Oracle (8i and older, or where the NO_QUERY_TRANSFORMATION hint is used):
SELECT
  T1.C1
FROM
  T1,
  T2
WHERE
  T1.C1=T2.C1(+)
  AND T2.C1 IS NULL;         C1



(NULL)
      2468
      8900

The row containing the NULL was completely unexpected.



| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time |

|* 1 | HASH JOIN ANTI | | 1 | 1 | 3 | 00:00:00.01 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
|* 3 | TABLE ACCESS FULL| T2 | 1 | 3 | 3 | 00:00:00.01 |

Predicate Information (identified by operation id):


   1 - access("T1"."C1"="T2"."C1")
   3 - filter("T2"."C1" IS NOT NULL)

With a /*+ NO_QUERY_TRANSFORMATION */ hint



| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time |

|* 1 | FILTER | | 1 | | 3 | 00:00:00.01 |
|* 2 | HASH JOIN OUTER | | 1 | 1 | 5 | 00:00:00.01 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 4 | 4 | 00:00:00.01 |

Predicate Information (identified by operation id):


   1 - filter("T2"."C1" IS NULL)
   2 - access("T1"."C1"="T2"."C1")

The minus syntax suggested by Michael O'Shea seems to get it right, and might be one way that I would suggest to attack the problem: SELECT
  C1
FROM
  T1
MINUS
SELECT
  C1
FROM
  T2;

        C1


      2468
      8900

----------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-
Time |

| 1 | MINUS | | 1 | | 2 | 00:00:00.01 |
| 2 | SORT UNIQUE | | 1 | 5 | 5 | 00:00:00.01 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | 00:00:00.01 |
| 4 | SORT UNIQUE | | 1 | 4 | 4 | 00:00:00.01 |
| 5 | TABLE ACCESS FULL| T2 | 1 | 4 | 4 | 00:00:00.01 |

However, the SORT UNIQUE in the plan makes me wonder what happens if non-unique values are permitted in T1.C1: INSERT INTO T1 VALUES (1357);
INSERT INTO T1 VALUES (2468); SELECT
  C1
FROM
  T1
MINUS
SELECT
  C1
FROM
  T2;

        C1


      2468
      8900

SELECT
  C1
FROM
  T1
WHERE
  C1 NOT IN (
    SELECT
      C1
    FROM
      T2
    WHERE C1 IS NOT NULL);         C1


      2468
      2468
      8900

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Sat Jul 12 2008 - 09:14:12 CDT

Original text of this message