Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: NOT IN/Nulls/ NOT EXISTS

Re: NOT IN/Nulls/ NOT EXISTS

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Sun, 08 Jul 2007 05:37:19 -0700
Message-ID: <1183898239.235579.267710@o61g2000hsh.googlegroups.com>


On Jul 7, 10:53 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Jul 6, 4:07 am, colmkav <colmj..._at_yahoo.co.uk> wrote:
>
> > which would you normally expect to work quicker and why? ie how does
> > the using of NOT EXISTS compare with the using of Nulls
>
> > 1)
> > select count(*) from (
> > SELECT distinct sourceid, strategy, strategyname, area, product
> > FROM t_bookmap
> > WHERE SourceId = 1 AND not exists(select strategy from
> > ta_strategy where SourceId = 1 AND TA_STRATEGY.strategy =
> > T_BOOKMAP.strategy)
> > )
>
> > 2)
> > select count (*) from (
> > SELECT distinct T.sourceid, T.strategy, T.strategyname, T.area,
> > T.product
> > FROM t_bookmap T, (select strategy from ta_strategy where
> > sourceid = 1) TA
> > WHERE T.SourceId = 1 AND TA.strategy(+) = T.strategy AND
> > TA.strategy is NULL
> > )
>
> I believe that I answered this question in a different response to
> you, but did not provide a test case for verification:http://groups.google.com/group/comp.databases.oracle.misc/browse_thre...
>
> Scott offered a good suggestion - test the performance, ideally with a
> representative data set.
>
> The answer will depend in part on the Oracle version. You will
> typically find that Oracle is able to transform the SQL statement into
> a more efficient form. The tests below were performed on Oracle
> 10.2.0.3
>
> Let's set up two tables with known data, one with 100,000 rows and one
> with 50,000 rows:
> CREATE TABLE T5 (
> MY_DATE DATE NOT NULL,
> MY_NUMBER NUMBER(12,10) NOT NULL,
> MY_ROW NUMBER(12) NOT NULL);
>
> CREATE TABLE T6 (
> MY_DATE DATE NOT NULL,
> MY_NUMBER NUMBER(12,10) NOT NULL,
> MY_ROW NUMBER(12) NOT NULL);
>
> INSERT INTO
> T5
> SELECT
> TRUNC(SYSDATE)+ROWNUM,
> COS(ROWNUM/180*3.141592),
> ROWNUM
> FROM
> DUAL
> CONNECT BY
> LEVEL<=100000;
>
> COMMIT;
>
> INSERT INTO
> T6
> SELECT
> TRUNC(SYSDATE)+ROWNUM*2,
> COS(ROWNUM/180*3.141592),
> ROWNUM
> FROM
> DUAL
> CONNECT BY
> LEVEL<=50000;
>
> COMMIT;
>
> Now, let's look at the first of the options, using NOT IN syntax that
> I cautioned you against using in the other thread. For the first run,
> let's now allow Oracle to transform the SQL statement into another
> form:
>
> SELECT /*+ NO_QUERY_TRANSFORMATION */
> T5.MY_DATE,
> T5.MY_NUMBER,
> T5.MY_ROW
> FROM
> T5
> WHERE
> T5.MY_DATE NOT IN (
> SELECT DISTINCT
> T6.MY_DATE
> FROM
> T6);
>
> The DBMS Xplan looks like this:
> -------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers |
> -------------------------------------------------------------------------------------
> |* 1 | FILTER | | 1 | | 50000 |
> 00:05:45.70 | 14M|
> | 2 | TABLE ACCESS FULL| T5 | 1 | 104K| 100K|
> 00:00:00.30 | 872 |
> |* 3 | TABLE ACCESS FULL| T6 | 100K| 543 | 50000 |
> 00:05:44.64 | 14M|
> -------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - filter( IS NULL)
> 3 - filter("T6"."MY_DATE"=:B1)
>
> The above query required 5 minutes and 45 seconds to execute, which is
> how the SQL statement would likely execute on Oracle 8i. The Starts
> value of 100K indicates that the table T6 was full tablescanned
> 100,000 times - once per row in the table T5. 5 minutes and 45
> seconds is a long execution time for this SQL statement.
>
> Let's try again, allowing Oracle to transform the SQL statement:
> SELECT
> T5.MY_DATE,
> T5.MY_NUMBER,
> T5.MY_ROW
> FROM
> T5
> WHERE
> T5.MY_DATE NOT IN (
> SELECT DISTINCT
> T6.MY_DATE
> FROM
> T6);
>
> The DBMS Xplan looks like this:
> ------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers | OMem | 1Mem | Used-Mem |
> ------------------------------------------------------------------------------------------------------------------
> |* 1 | HASH JOIN RIGHT ANTI| | 1 | 49944 | 50000 |
> 00:00:01.43 | 1075 | 1821K| 1267K| 2202K (0)|
> | 2 | TABLE ACCESS FULL | T6 | 1 | 54261 | 50000 |
> 00:00:00.10 | 203 | | | |
> | 3 | TABLE ACCESS FULL | T5 | 1 | 104K| 100K|
> 00:00:00.20 | 872 | | | |
> ------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - access("T5"."MY_DATE"="T6"."MY_DATE")
>
> Note the change in the plan. Oracle transformed the SQL statement
> into an anti-join which completed in 1.43 seconds. Remember that
> plan, we might see it again.
>
> Let's see what happens with an outer join as I recommended in the
> other thread. First, we will not allow Oracle to transform the SQL
> statement:
> SELECT /*+ NO_QUERY_TRANSFORMATION */
> T5.MY_DATE,
> T5.MY_NUMBER,
> T5.MY_ROW
> FROM
> T5,
> T6
> WHERE
> T5.MY_DATE=T6.MY_DATE(+)
> AND T6.MY_DATE IS NULL;
>
> The DBMS Xplan looks like this:
> --------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers | OMem | 1Mem | Used-Mem |
> --------------------------------------------------------------------------------------------------------------------
> |* 1 | FILTER | | 1 | | 50000 |
> 00:00:01.83 | 1075 | | | |
> |* 2 | HASH JOIN RIGHT OUTER| | 1 | 100K| 100K|
> 00:00:01.53 | 1075 | 1821K| 1267K| 2242K (0)|
> | 3 | TABLE ACCESS FULL | T6 | 1 | 50000 | 50000 |
> 00:00:00.10 | 203 | | | |
> | 4 | TABLE ACCESS FULL | T5 | 1 | 100K| 100K|
> 00:00:00.20 | 872 | | | |
> --------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - filter("T6"."MY_DATE" IS NULL)
> 2 - access("T5"."MY_DATE"="T6"."MY_DATE")
>
> This query required 1.83 seconds, which is a fraction of a second
> longer than the previous version that Oracle transformed, much better
> than the 5 minutes and 45 seconds of the first run.
>
> This time, let's let Oracle transform the outer join:
> SELECT
> T5.MY_DATE,
> T5.MY_NUMBER,
> T5.MY_ROW
> FROM
> T5,
> T6
> WHERE
> T5.MY_DATE=T6.MY_DATE(+)
> AND T6.MY_DATE IS NULL;
>
> The DBMS Xplan looks like this:
> ------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers | OMem | 1Mem | Used-Mem |
> ------------------------------------------------------------------------------------------------------------------
> |* 1 | HASH JOIN RIGHT ANTI| | 1 | 50000 | 50000 |
> 00:00:01.43 | 1075 | 1821K| 1267K| 2194K (0)|
> | 2 | TABLE ACCESS FULL | T6 | 1 | 50000 | 50000 |
> 00:00:00.10 | 203 | | | |
> | 3 | TABLE ACCESS FULL | T5 | 1 | 100K| 100K|
> 00:00:00.20 | 872 | | | |
> ------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - access("T5"."MY_DATE"="T6"."MY_DATE")
>
> The plan and times are identical to the first transformation of the
> NOT IN syntax.
>
> Let's try the NOT EXISTS syntax, but not let Oracle transform the SQL
> statement:
> SELECT /*+ NO_QUERY_TRANSFORMATION */
> T5.MY_DATE,
> T5.MY_NUMBER,
> T5.MY_ROW
> FROM
> T5
> WHERE
> NOT EXISTS (
> SELECT
> T6.MY_DATE
> FROM
> T6
> WHERE
> T6.MY_DATE=T5.MY_DATE);
>
> The DBMS Xplan looks like this:
> -------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers |
> -------------------------------------------------------------------------------------
> |* 1 | FILTER | | 1 | | 50000 |
> 00:05:43.30 | 14M|
> | 2 | TABLE ACCESS FULL| T5 | 1 | 100K| 100K|
> 00:00:00.30 | 872 |
> |* 3 | TABLE ACCESS FULL| T6 | 100K| 1 | 50000 |
> 00:05:42.21 | 14M|
> -------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - filter( IS NULL)
> 3 - filter("T6"."MY_DATE"=:B1)
>
> 5 minutes and 43 seconds to execute, which is very close the non-
> transformed NOT IN syntax - the plan looks quite similar also.
>
> Let's try the NOT EXISTS syntax, but this time let Oracle transform
> the SQL statement:
> SELECT
> T5.MY_DATE,
> T5.MY_NUMBER,
> T5.MY_ROW
> FROM
> T5
> WHERE
> NOT EXISTS (
> SELECT
> T6.MY_DATE
> FROM
> T6
> WHERE
> T6.MY_DATE=T5.MY_DATE);
>
> The DBMS Xplan looks like this:
> ------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers | OMem | 1Mem | Used-Mem |
> ------------------------------------------------------------------------------------------------------------------
> |* 1 | HASH JOIN RIGHT ANTI| | 1 | 50000 | 50000 |
> 00:00:01.43 | 1075 | 1821K| 1267K| 2194K (0)|
> | 2 | TABLE ACCESS FULL | T6 | 1 | 50000 | 50000 |
> 00:00:00.10 | 203 | | | |
> | 3 | TABLE ACCESS FULL | T5 | 1 | 100K| 100K|
> 00:00:00.20 | 872 | | | |
> ------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - access("T6"."MY_DATE"="T5"."MY_DATE")
>
> The plan and times are identical to the first transformation of the
> NOT IN syntax, and of course the outer join syntax.
>
> In summary, unassisted by Oracle's transformations, the outer join
> syntax executes in the least amount of time. Allowing Oracle
> transformations, all methods execute using the same plan.
>
> Now, what if we allow NULL values?
> ALTER TABLE T5 MODIFY (MY_DATE NULL);
> ALTER TABLE T6 MODIFY (MY_DATE NULL);
>
> INSERT INTO
> T5
> SELECT
> NULL,
> COS((ROWNUM-1)/180*3.141592),
> ROWNUM-1
> FROM
> DUAL
> CONNECT BY
> LEVEL<=1;
>
> INSERT INTO
> T6
> SELECT
> NULL,
> COS((ROWNUM-1)/180*3.141592),
> ROWNUM-1
> FROM
> DUAL
> CONNECT BY
> LEVEL<=1;
>
> I will leave this experiment to you, but you will likely find that the
> transformations available to Oracle have changed. To get you started,
> here is the DBMS Xplan for the NOT IN syntax, even when allowing
> Oracle to transform the SQL statement:
> -------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers |
> -------------------------------------------------------------------------------------
> |* 1 | FILTER | | 1 | | 50000 |
> 00:06:33.41 | 14M|
> | 2 | TABLE ACCESS FULL| T5 | 1 | 100K| 100K|
> 00:00:00.30 | 872 |
> |* 3 | TABLE ACCESS FULL| T6 | 100K| 1 | 50000 |
> 00:06:32.36 | 14M|
> -------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - filter( IS NULL)
> 3 - filter(LNNVL("T6"."MY_DATE"<>:B1))
>
> 6 minutes and 33 seconds, a bit longer than the original 5 minutes and
> 45 seconds
>
> Here is the plan for the outer join syntax with the table modification
> that allows NULLS:
> ------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | A-Rows | A-
> Time | Buffers | OMem | 1Mem | Used-Mem |
> ------------------------------------------------------------------------------------------------------------------
> |* 1 | HASH JOIN RIGHT ANTI| | 1 | 50000 | 50000 |
> 00:00:01.38 | 1075 | 1821K| 1267K| 2198K (0)|
> | 2 | TABLE ACCESS FULL | T6 | 1 | 50000 | 50000 |
> 00:00:00.10 | 203 | | | |
> | 3 | TABLE ACCESS FULL | T5 | 1 | 100K| 100K|
> 00:00:00.20 | 872 | | | |
> ------------------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - access("T5"."MY_DATE"="T6"."MY_DATE")
>
> 1.38 seconds.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.

The last two explain plans above were generated without adding the records containing the NULL values into the T5 and T6 tables - only the table definitions were modified. In the above tests with the outer join syntax, I cheated a bit, as I knew more about the data that was placed in the tables than I do about the data placed in the OPs tables. This knowledge allowed me to remove the DISTINCT requirement on the T6 table, which also removed the requirement for an inline view.

With NULLs still permitted on T5.MY_DATE and T6.MY_DATE, the results when I do not know if there will be unique T6.MY_DATE values, allowing for no transformations:
SELECT /*+ NO_QUERY_TRANSFORMATION */

  T5.MY_DATE,
  T5.MY_NUMBER,
  T5.MY_ROW

FROM
  T5,
  (SELECT DISTINCT
    T6.MY_DATE
  FROM
    T6) T6
WHERE
  T5.MY_DATE=T6.MY_DATE(+)
  AND T6.MY_DATE IS NULL;

| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time | Buffers | OMem | 1Mem | Used-Mem |
|*  1 |  FILTER                |      |      1 |        |  50000 |
00:00:02.27 |    1075 |       |       |          |
|*  2 |   HASH JOIN RIGHT OUTER|      |      1 |    100K|    100K|
00:00:01.96 |    1075 |  1821K|  1267K| 2198K (0)|
|   3 |    VIEW                |      |      1 |  50000 |  50000 |
00:00:00.52 |     203 |       |       |          |
|   4 |     HASH UNIQUE        |      |      1 |  50000 |  50000 |
00:00:00.32 |     203 |       |       |          |
|   5 |      TABLE ACCESS FULL | T6   |      1 |  50000 |  50000 |
00:00:00.10 |     203 |       |       |          |
|   6 |    TABLE ACCESS FULL   | T5   |      1 |    100K|    100K|
00:00:00.20 |     872 |       |       |          |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   1 - filter("T6"."MY_DATE" IS NULL)
   2 - access("T5"."MY_DATE"="T6"."MY_DATE")

The above required 2.27 seconds to complete.

Let's try again, allowing for query transformations: SELECT

  T5.MY_DATE,
  T5.MY_NUMBER,
  T5.MY_ROW

FROM
  T5,
  (SELECT DISTINCT
    T6.MY_DATE
  FROM
    T6) T6
WHERE
  T5.MY_DATE=T6.MY_DATE(+)
  AND T6.MY_DATE IS NULL;

| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time | Buffers | OMem | 1Mem | Used-Mem |

|* 1 | HASH JOIN RIGHT ANTI| | 1 | 50000 | 50000 | 00:00:01.87 | 1075 | 1821K| 1267K| 2198K (0)|
|   2 |   VIEW               |      |      1 |  50000 |  50000 |
00:00:00.58 |     203 |       |       |          |
|   3 |    HASH UNIQUE       |      |      1 |  50000 |  50000 |
00:00:00.33 |     203 |       |       |          |
|   4 |     TABLE ACCESS FULL| T6   |      1 |  50000 |  50000 |
00:00:00.10 |     203 |       |       |          |
|   5 |   TABLE ACCESS FULL  | T5   |      1 |    100K|    100K|
00:00:00.20 |     872 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   1 - access("T5"."MY_DATE"="T6"."MY_DATE")

The above required 1.87 seconds to complete.

Here is the explain plan for the NOT EXISTS syntax when the tables permit NULLS in the T5.MY_DATE and T6.MY_DATE columns:



| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time | Buffers | OMem | 1Mem | Used-Mem |

|* 1 | HASH JOIN RIGHT ANTI| | 1 | 50000 | 50000 | 00:00:01.43 | 1075 | 1821K| 1267K| 2197K (0)|
|   2 |   TABLE ACCESS FULL  | T6   |      1 |  50000 |  50000 |
00:00:00.10 |     203 |       |       |          |
|   3 |   TABLE ACCESS FULL  | T5   |      1 |    100K|    100K|
00:00:00.20 |     872 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   1 - access("T6"."MY_DATE"="T5"."MY_DATE")

As the above illustrates, there is no clear answer as to which method is best from a performance standpoint. Performance depends on the data in the tables, Oracle version, table definitions, the transformations that Oracle is permitted to use, and initialization parameter values. The rows output by the above methods may differ slightly when NULLs are encountered.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Sun Jul 08 2007 - 07:37:19 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US