Re: Ramming two SQL queries together.....

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Fri, 28 May 2010 11:38:47 -0700 (PDT)
Message-ID: <4d4f069c-dc39-4135-96ca-3d55d7708710_at_40g2000vbr.googlegroups.com>



On May 28, 1:23 pm, joel garry <joel-ga..._at_home.com> wrote:
> <mr.frog.to...._at_googlemail.com> wrote:
> > Thankyou both gentlemen. I appreciate the feedback you have given me.
> > I am not an Oracle expert so I was not sure what the best approach
> > was. I will set both up and see which runs faster (I am guessing the
> > inline due to the way correlated subqueries run).
>
> > I thank you both very much.
>
> > Cheers
>
> > The Frog
>
> I'm wondering about how you think correlated subqueries run.  Seehttp://download.oracle.com/docs/cd/B19306_01/server.102/b14200/querie...http://jonathanlewis.wordpress.com/2007/03/08/transformation-and-opti...
>
> If you haven't already, get Jonathan's book about the optimizer.  See
> this example:  http://jonathanlewis.wordpress.com/2006/11/06/filter-subqueries/
>
> jg

Joel,

I wonder if it is not just a terminology problem here. Essentially, what do you call it when a SELECT statement appears in a column position of another query. I think that Walt and The Frog were on the same page regarding what they are talking about.

I think that the Oracle definition of a correlated subquery is a SELECT statement that appears in the WHERE clause of another SQL statement, and one or more columns from the parent SQL statement is directly related to one or more columns (or generated values) from the SELECT statement in the WHERE clause.
http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/statements_10002.htm#i2066912 "The following examples show the general syntax of a correlated subquery:
SELECT select_list

    FROM table1 t_alias1
    WHERE expr operator

        (SELECT column_list
            FROM table2 t_alias2
            WHERE t_alias1.column
               operator t_alias2.column);"

I think that there is a proper name for the case where a subquery appears in a column position of another query, but I can't think of that name at the moment.

Here is possibly interesting test case:
CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80)); INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000))) FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000);

CREATE INDEX IND_T1 ON T1(ID); CREATE TABLE T2 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T3 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE TABLE T4 AS
SELECT
  ROWNUM C1,
  LPAD('A',100,'A') C2
FROM
  DUAL
CONNECT BY
  LEVEL<=10000;

CREATE INDEX IND_T4 ON T4(C1); COMMIT; EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE) EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE) EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE) There are just 1,000,000 rows in table T1.

Now, let's try a test script using the tables: SET AUTOTRACE TRACEONLY EXPLAIN SELECT

  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1

FROM
  T1,
  T2,
  T3,

  T4
WHERE
  T2.C1 BETWEEN 1 AND 200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 200; The first SQL statement directly joins the tables, while the second places SELECT statements in column positions. The output (11.1.0.7): SQL> SELECT

  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1

  6 FROM
  7    T1,
  8    T2,
  9    T3,

 10 T4
 11 WHERE
 12 T2.C1 BETWEEN 1 AND 200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

Execution Plan



Plan hash value: 3780653648
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)|
Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |   457 |  7312 | 10041   (1)|
00:00:41 |
|*  1 |  HASH JOIN           |        |   457 |  7312 | 10041   (1)|
00:00:41 |
|*  2 |   HASH JOIN          |        |   198 |  2376 |    46   (5)|
00:00:01 |
|*  3 |    HASH JOIN         |        |   199 |  1592 |    43   (3)|
00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2     |   200 |   800 |    21   (0)|
00:00:01 |
|*  5 |     TABLE ACCESS FULL| T3     |   200 |   800 |    21   (0)|
00:00:01 |
|*  6 |    INDEX RANGE SCAN  | IND_T4 |   200 |   800 |     2   (0)|
00:00:01 |
|*  7 |   TABLE ACCESS FULL  | T1     | 20002 | 80008 |  9994   (1)|
00:00:41 |

Predicate Information (identified by operation id):


   1 - access("T2"."C1"="T1"."ID")
   2 - access("T2"."C1"="T4"."C1")
   3 - access("T2"."C1"="T3"."C1")
   4 - filter("T2"."C1"<=200 AND "T2"."C1">=1)
   5 - filter("T3"."C1"<=200 AND "T3"."C1">=1)
   6 - access("T4"."C1">=1 AND "T4"."C1"<=200)
   7 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

SQL> SELECT
  2 T1.ID,
  3 (SELECT
  4 T2.C1
  5 FROM
  6 T2
  7 WHERE
  8 T1.ID=T2.C1) T2_C1,
  9 (SELECT
 10 T3.C1
 11 FROM
 12 T3
 13 WHERE
 14 T1.ID=T3.C1) T3_C1,
 15 (SELECT
 16 T4.C1
 17 FROM
 18 T4
 19 WHERE
 20 T1.ID=T4.C1) T4_C1
 21 FROM
 22 T1
 23 WHERE
 24 T1.ID BETWEEN 1 AND 200; Execution Plan



Plan hash value: 2945978589
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)|
Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 20002 | 80008 |  9994   (1)|
00:00:41 |
|*  1 |  TABLE ACCESS FULL| T2     |     1 |     4 |    21   (0)|
00:00:01 |
|*  2 |  TABLE ACCESS FULL| T3     |     1 |     4 |    21   (0)|
00:00:01 |
|*  3 |  INDEX RANGE SCAN | IND_T4 |     1 |     4 |     1   (0)|
00:00:01 |
|*  4 |  TABLE ACCESS FULL| T1     | 20002 | 80008 |  9994   (1)|
00:00:41 |

Predicate Information (identified by operation id):


   1 - filter("T2"."C1"=:B1)
   2 - filter("T3"."C1"=:B1)
   3 - access("T4"."C1"=:B1)
   4 - filter("T1"."ID"<=200 AND "T1"."ID">=1)

If we were to actually run the SQL statements, we might find that the first runs in about 15 seconds and the second in about 16 seconds, both with the same number of physical reads. That is no fun, so let's change the number 200 to 1200 to see what happens.

SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SET ARRAYSIZE 1000 SELECT

  T1.ID,
  T2.C1 T2_C1,
  T3.C1 T3_C1,
  T4.C1 T4_C1

FROM
  T1,
  T2,
  T3,

  T4
WHERE
  T2.C1 BETWEEN 1 AND 1200
  AND T2.C1=T3.C1
  AND T2.C1=T4.C1
  AND T2.C1=T1.ID;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE; SELECT
  T1.ID,
  (SELECT
    T2.C1
  FROM
    T2
  WHERE
    T1.ID=T2.C1) T2_C1,
  (SELECT
    T3.C1
  FROM
    T3
  WHERE
    T1.ID=T3.C1) T3_C1,
  (SELECT
    T4.C1
  FROM
    T4
  WHERE
    T1.ID=T4.C1) T4_C1
FROM
  T1
WHERE
  T1.ID BETWEEN 1 AND 1200; Here is the output:
SQL> SELECT

  2    T1.ID,
  3    T2.C1 T2_C1,
  4    T3.C1 T3_C1,
  5    T4.C1 T4_C1

  6 FROM
  7    T1,
  8    T2,
  9    T3,

 10 T4
 11 WHERE
 12 T2.C1 BETWEEN 1 AND 1200
 13    AND T2.C1=T3.C1
 14    AND T2.C1=T4.C1
 15    AND T2.C1=T1.ID;

76580 rows selected.

Elapsed: 00:00:15.96

Statistics


          1  recursive calls
          0  db block gets
      83197  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

SQL> SELECT
  2 T1.ID,
  3 (SELECT
  4 T2.C1
  5 FROM
  6 T2
  7 WHERE
  8 T1.ID=T2.C1) T2_C1,
  9 (SELECT
 10 T3.C1
 11 FROM
 12 T3
 13 WHERE
 14 T1.ID=T3.C1) T3_C1,
 15 (SELECT
 16 T4.C1
 17 FROM
 18 T4
 19 WHERE
 20 T1.ID=T4.C1) T4_C1
 21 FROM
 22 T1
 23 WHERE
 24 T1.ID BETWEEN 1 AND 1200; 76580 rows selected.

Elapsed: 00:01:40.09

Statistics


          1  recursive calls
          0  db block gets
   10073639  consistent gets
      83110  physical reads
          0  redo size
    1288037  bytes sent via SQL*Net to client
       1217  bytes received via SQL*Net from client
         78  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      76580  rows processed

The number of consistent gets jumps significantly for the second SQL statement and so did the execution time (and CPU usage), compare with the output for T1.ID between 1 and 200:
12732 rows selected.

Elapsed: 00:00:17.54

Statistics


          0  recursive calls
          0  db block gets
     522390  consistent gets
      83108  physical reads
          0  redo size
     196813  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      12732  rows processed

---

I think that The Frog is on the right track to test the performance of
both approaches.

Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration
from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
Received on Fri May 28 2010 - 13:38:47 CDT

Original text of this message