Re: Ramming two SQL queries together.....
Date: Fri, 28 May 2010 11:38:47 -0700 (PDT)
Message-ID: <4d4f069c-dc39-4135-96ca-3d55d7708710_at_40g2000vbr.googlegroups.com>
On May 28, 1:23 pm, joel garry <joel-ga..._at_home.com> wrote:
> <mr.frog.to...._at_googlemail.com> wrote:
> > Thankyou both gentlemen. I appreciate the feedback you have given me.
> > I am not an Oracle expert so I was not sure what the best approach
> > was. I will set both up and see which runs faster (I am guessing the
> > inline due to the way correlated subqueries run).
>
> > I thank you both very much.
>
> > Cheers
>
> > The Frog
>
> I'm wondering about how you think correlated subqueries run. Seehttp://download.oracle.com/docs/cd/B19306_01/server.102/b14200/querie...http://jonathanlewis.wordpress.com/2007/03/08/transformation-and-opti...
>
> If you haven't already, get Jonathan's book about the optimizer. See
> this example: http://jonathanlewis.wordpress.com/2006/11/06/filter-subqueries/
>
> jg
Joel,
I wonder if it is not just a terminology problem here. Essentially, what do you call it when a SELECT statement appears in a column position of another query. I think that Walt and The Frog were on the same page regarding what they are talking about.
I think that the Oracle definition of a correlated subquery is a
SELECT statement that appears in the WHERE clause of another SQL
statement, and one or more columns from the parent SQL statement is
directly related to one or more columns (or generated values) from the
SELECT statement in the WHERE clause.
http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/statements_10002.htm#i2066912
"The following examples show the general syntax of a correlated
subquery:
SELECT select_list
FROM table1 t_alias1
WHERE expr operator
(SELECT column_list FROM table2 t_alias2 WHERE t_alias1.column operator t_alias2.column);"
I think that there is a proper name for the case where a subquery appears in a column position of another query, but I can't think of that name at the moment.
Here is possibly interesting test case:
CREATE TABLE T1 (
ID NUMBER,
DESCRIPTION VARCHAR2(80));
INSERT INTO T1
SELECT
CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
'This is the long description for this number '||
TO_CHAR(CEIL(ABS(SIN(ROWNUM/9.9999)*10000)))
FROM
(SELECT
ROWNUM RN
FROM
DUAL
CONNECT BY
LEVEL<=1000),
(SELECT
ROWNUM RN
FROM
DUAL
CONNECT BY
LEVEL<=1000);
CREATE INDEX IND_T1 ON T1(ID);
CREATE TABLE T2 AS
SELECT
ROWNUM C1,
LPAD('A',100,'A') C2
FROM
DUAL
CONNECT BY
LEVEL<=10000;
CREATE TABLE T3 AS
SELECT
ROWNUM C1,
LPAD('A',100,'A') C2
FROM
DUAL
CONNECT BY
LEVEL<=10000;
CREATE TABLE T4 AS
SELECT
ROWNUM C1,
LPAD('A',100,'A') C2
FROM
DUAL
CONNECT BY
LEVEL<=10000;
CREATE INDEX IND_T4 ON T4(C1);
COMMIT;
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,METHOD_OPT=>'FOR
ALL COLUMNS SIZE 1')
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE)
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE)
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE)
There are just 1,000,000 rows in table T1.
Now, let's try a test script using the tables: SET AUTOTRACE TRACEONLY EXPLAIN SELECT
T1.ID, T2.C1 T2_C1, T3.C1 T3_C1, T4.C1 T4_C1
FROM
T1, T2, T3,
T4
WHERE
T2.C1 BETWEEN 1 AND 200
AND T2.C1=T3.C1 AND T2.C1=T4.C1 AND T2.C1=T1.ID;
SELECT
T1.ID,
(SELECT
T2.C1
FROM
T2
WHERE
T1.ID=T2.C1) T2_C1,
(SELECT
T3.C1
FROM
T3
WHERE
T1.ID=T3.C1) T3_C1,
(SELECT
T4.C1
FROM
T4
WHERE
T1.ID=T4.C1) T4_C1
FROM
T1
WHERE
T1.ID BETWEEN 1 AND 200;
The first SQL statement directly joins the tables, while the second
places SELECT statements in column positions. The output (11.1.0.7):
SQL> SELECT
2 T1.ID, 3 T2.C1 T2_C1, 4 T3.C1 T3_C1, 5 T4.C1 T4_C1
6 FROM
7 T1, 8 T2, 9 T3,
10 T4
11 WHERE
12 T2.C1 BETWEEN 1 AND 200
13 AND T2.C1=T3.C1 14 AND T2.C1=T4.C1 15 AND T2.C1=T1.ID;
Execution Plan
Plan hash value: 3780653648
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 457 | 7312 | 10041 (1)| 00:00:41 | |* 1 | HASH JOIN | | 457 | 7312 | 10041 (1)| 00:00:41 | |* 2 | HASH JOIN | | 198 | 2376 | 46 (5)| 00:00:01 | |* 3 | HASH JOIN | | 199 | 1592 | 43 (3)| 00:00:01 | |* 4 | TABLE ACCESS FULL| T2 | 200 | 800 | 21 (0)| 00:00:01 | |* 5 | TABLE ACCESS FULL| T3 | 200 | 800 | 21 (0)| 00:00:01 | |* 6 | INDEX RANGE SCAN | IND_T4 | 200 | 800 | 2 (0)| 00:00:01 | |* 7 | TABLE ACCESS FULL | T1 | 20002 | 80008 | 9994 (1)|00:00:41 |
Predicate Information (identified by operation id):
1 - access("T2"."C1"="T1"."ID") 2 - access("T2"."C1"="T4"."C1") 3 - access("T2"."C1"="T3"."C1") 4 - filter("T2"."C1"<=200 AND "T2"."C1">=1) 5 - filter("T3"."C1"<=200 AND "T3"."C1">=1) 6 - access("T4"."C1">=1 AND "T4"."C1"<=200) 7 - filter("T1"."ID"<=200 AND "T1"."ID">=1)
SQL> SELECT
2 T1.ID,
3 (SELECT
4 T2.C1
5 FROM
6 T2
7 WHERE
8 T1.ID=T2.C1) T2_C1,
9 (SELECT
10 T3.C1
11 FROM
12 T3
13 WHERE
14 T1.ID=T3.C1) T3_C1,
15 (SELECT
16 T4.C1
17 FROM
18 T4
19 WHERE
20 T1.ID=T4.C1) T4_C1
21 FROM
22 T1
23 WHERE
24 T1.ID BETWEEN 1 AND 200;
Execution Plan
Plan hash value: 2945978589
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20002 | 80008 | 9994 (1)| 00:00:41 | |* 1 | TABLE ACCESS FULL| T2 | 1 | 4 | 21 (0)| 00:00:01 | |* 2 | TABLE ACCESS FULL| T3 | 1 | 4 | 21 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IND_T4 | 1 | 4 | 1 (0)| 00:00:01 | |* 4 | TABLE ACCESS FULL| T1 | 20002 | 80008 | 9994 (1)|00:00:41 |
Predicate Information (identified by operation id):
1 - filter("T2"."C1"=:B1) 2 - filter("T3"."C1"=:B1) 3 - access("T4"."C1"=:B1) 4 - filter("T1"."ID"<=200 AND "T1"."ID">=1)
If we were to actually run the SQL statements, we might find that the first runs in about 15 seconds and the second in about 16 seconds, both with the same number of physical reads. That is no fun, so let's change the number 200 to 1200 to see what happens.
SET TIMING ON
SET AUTOTRACE TRACEONLY STATISTICS;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SET ARRAYSIZE 1000
SELECT
T1.ID, T2.C1 T2_C1, T3.C1 T3_C1, T4.C1 T4_C1
FROM
T1, T2, T3,
T4
WHERE
T2.C1 BETWEEN 1 AND 1200
AND T2.C1=T3.C1 AND T2.C1=T4.C1 AND T2.C1=T1.ID;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT
T1.ID,
(SELECT
T2.C1
FROM
T2
WHERE
T1.ID=T2.C1) T2_C1,
(SELECT
T3.C1
FROM
T3
WHERE
T1.ID=T3.C1) T3_C1,
(SELECT
T4.C1
FROM
T4
WHERE
T1.ID=T4.C1) T4_C1
FROM
T1
WHERE
T1.ID BETWEEN 1 AND 1200;
Here is the output:
SQL> SELECT
2 T1.ID, 3 T2.C1 T2_C1, 4 T3.C1 T3_C1, 5 T4.C1 T4_C1
6 FROM
7 T1, 8 T2, 9 T3,
10 T4
11 WHERE
12 T2.C1 BETWEEN 1 AND 1200
13 AND T2.C1=T3.C1 14 AND T2.C1=T4.C1 15 AND T2.C1=T1.ID;
76580 rows selected.
Elapsed: 00:00:15.96
Statistics
1 recursive calls 0 db block gets 83197 consistent gets 83110 physical reads 0 redo size 1288037 bytes sent via SQL*Net to client 1217 bytes received via SQL*Net from client 78 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 76580 rows processed
SQL> SELECT
2 T1.ID,
3 (SELECT
4 T2.C1
5 FROM
6 T2
7 WHERE
8 T1.ID=T2.C1) T2_C1,
9 (SELECT
10 T3.C1
11 FROM
12 T3
13 WHERE
14 T1.ID=T3.C1) T3_C1,
15 (SELECT
16 T4.C1
17 FROM
18 T4
19 WHERE
20 T1.ID=T4.C1) T4_C1
21 FROM
22 T1
23 WHERE
24 T1.ID BETWEEN 1 AND 1200;
76580 rows selected.
Elapsed: 00:01:40.09
Statistics
1 recursive calls 0 db block gets 10073639 consistent gets 83110 physical reads 0 redo size 1288037 bytes sent via SQL*Net to client 1217 bytes received via SQL*Net from client 78 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 76580 rows processed
The number of consistent gets jumps significantly for the second SQL
statement and so did the execution time (and CPU usage), compare with
the output for T1.ID between 1 and 200:
12732 rows selected.
Elapsed: 00:00:17.54
Statistics
0 recursive calls 0 db block gets 522390 consistent gets 83108 physical reads 0 redo size 196813 bytes sent via SQL*Net to client 513 bytes received via SQL*Net from client 14 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 12732 rows processed
--- I think that The Frog is on the right track to test the performance of both approaches. Charles Hooper Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table" http://hoopercharles.wordpress.com/ IT Manager/Oracle DBA K&M Machine-Fabricating, Inc.Received on Fri May 28 2010 - 13:38:47 CDT