Home » RDBMS Server » Performance Tuning » SubQuery - Joins performance compare (Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production)
SubQuery - Joins performance compare [message #580070] Tue, 19 March 2013 18:17 Go to next message
isri
Messages: 14
Registered: November 2012
Junior Member
Hello,

I'm wondering about difference in performance between SubQueries used in Select statement and Joins operations. Let's have a look for a very basic example:

********************************************************************************

SQL ID: 7qf3v64qrrjcg Plan Hash: 3238298962

SELECT D.DEPARTMENT_ID, L.LOCATION_ID 
FROM
 HR.DEPARTMENTS D LEFT JOIN HR.LOCATIONS L ON (L.LOCATION_ID = D.LOCATION_ID)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0          6          0          27
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0          6          0          27

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 5  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
        27         27         27  NESTED LOOPS OUTER (cr=6 pr=0 pw=0 time=392 us cost=2 size=270 card=27)
        27         27         27   TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=112 us cost=2 size=189 card=27)
        27         27         27   INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=111 us cost=0 size=3 card=1)(object id 20135)

********************************************************************************

SQL ID: amtfyvm0wv1q5 Plan Hash: 955629947

SELECT D.DEPARTMENT_ID, (SELECT LOCATION_ID 
FROM
 HR.LOCATIONS L WHERE L.LOCATION_ID = D.LOCATION_ID) FROM HR.DEPARTMENTS D


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0          6          0          27
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0          6          0          27

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 5  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         7          7          7  INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=37 us cost=0 size=3 card=1)(object id 20135)
        27         27         27  TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=58 us cost=2 size=189 card=27)




********************************************************************************


I've thought that writing such subqueries in select statement are rather not good when we are thinking about performance (and of course code readability).
I know example above is quite easy, and it looks like school homework - it is not Wink
Last time I was working on some query that was looking more less like that:

SELECT
A.COL1,
B.COL2,
A.COL3,
(SELECT COL4 FROM TABLEC WHERE ID = A.ID),
(SELECT COL5 FROM TABLEC WHERE ID = A.ID),
(SELECT COL6 FROM TABLED WHERE ID = A.ID),
(SELECT COL1 FROM TABLEE WHERE ID = A.ID),
(SELECT COL2 FROM TABLEF WHERE ID = A.ID),
(SELECT COL3 FROM TABLEF WHERE ID = A.ID),
(SELECT COL8 FROM TABLEG WHERE ID = A.ID),
(SELECT COL3 FROM TABLEG WHERE ID = A.ID),
(SELECT COL12 FROM TABLEG WHERE ID = A.ID),
(SELECT COL13 FROM TABLEH WHERE ID = A.ID),
(SELECT COL14 FROM TABLEH WHERE ID = B.ID),
(SELECT COL10 FROM TABLEA WHERE ID = B.ID),
(SELECT COL11 FROM TABLEA WHERE ID = B.ID),
(SELECT COL12 FROM TABLEB WHERE ID = B.ID),
(SELECT COL14 FROM TABLED WHERE ID = B.ID),
(SELECT COL15 FROM TABLEK WHERE ID = B.ID),
(SELECT COL16 FROM TABLEB WHERE ID = B.ID)

FROM TABA A INNER JOIN TABB B ON (A.ID = B.ID)

UNION ALL

SELECT (...)

UNION ALL

SELECT (...)


Additionally there were a lot of NVL and DECODE within subselects and additionally those subselects were usually made from few tables joined or 'unioned'. It was running quite well but was completely unreadable...
I've spent few hours to rebuild it to more readable way (using Joins instead of subselect) but each time I was removing one of subselect performance was decreasing. Example above, although it is very simple I think it shows very well how it looks like.

I wanted to ask is it normal and if there is a way to make query that is using JOINS and would work at least as fast as second query with subselects?


Many thanks

P.S. I'm using HR_SCHEMA delivered as example schema by Oracle, please see attachment.
  • Attachment: hr_whole.sql
    (Size: 74.47KB, Downloaded 171 times)
Re: SubQuery - Joins performance compare [message #580080 is a reply to message #580070] Wed, 20 March 2013 01:23 Go to previous messageGo to next message
Michel Cadot
Messages: 59821
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
It depends of the case there is no general rule except that do NOT use subquery in SELECT clause unless you have a very good reason to do it.

Regards
Michel
Re: SubQuery - Joins performance compare [message #580082 is a reply to message #580070] Wed, 20 March 2013 01:33 Go to previous messageGo to next message
michael_bialik
Messages: 605
Registered: July 2006
Senior Member
As usual - it depends.

In your example

SELECT D.DEPARTMENT_ID, L.LOCATION_ID 
FROM
 HR.DEPARTMENTS D LEFT JOIN HR.LOCATIONS L ON (L.LOCATION_ID = D.LOCATION_ID) 

vs

SELECT D.DEPARTMENT_ID, (SELECT LOCATION_ID 
FROM
 HR.LOCATIONS L WHERE L.LOCATION_ID = D.LOCATION_ID) FROM HR.DEPARTMENTS D


for both queries execution plan is about the same:
1. Full table scan on DEPARTMENTS
2. Nested loop join to LOCATIONS using unique index (for each row of DEPARTMENTS).

So the performance is about the same as well.

However when looking at "real" query you can see that it performs on average 2 accesses to the same row:
...(SELECT COL4 FROM TABLEC WHERE ID = A.ID),
   (SELECT COL5 FROM TABLEC WHERE ID = A.ID),...


accesses the same row of TABLEC twice,

...(SELECT COL8 FROM TABLEG WHERE ID = A.ID),
   (SELECT COL3 FROM TABLEG WHERE ID = A.ID),
   (SELECT COL12 FROM TABLEG WHERE ID = A.ID),...


causes 3 accesses to the same row of TABLEG and so on...

When using JOIN operation - all these tables will be accessed one time only. That's the main reason (IMHO) to use
JOIN instead of scalar selects.
Another reason is that scalar select simulates NESTED LOOP join. When joining large number of rows - HASH or even SORT-MERGE
may be preferable (performance wise), so there in no reason forcing optimizer to use one specific join type.

If your "JOIN" query performance is worse then query with sub/scalar selects - post tkprof.

HTH
Re: SubQuery - Joins performance compare [message #580087 is a reply to message #580082] Wed, 20 March 2013 02:56 Go to previous message
isri
Messages: 14
Registered: November 2012
Junior Member
Hello,

Many thanks for the answers. Unfortunately I'm not allowed to post here or anywhere the code I was working on, maybe I will be able to build simillar situation at home.

One more thing I was wondering - the 'time'

(...)
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
        27         27         27  NESTED LOOPS OUTER (cr=6 pr=0 pw=0 time=392 us cost=2 size=270 card=27)
        27         27         27   TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=112 us cost=2 size=189 card=27)
        27         27         27   INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=111 us cost=0 size=3 card=1)(object id 20135)

(...)

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         7          7          7  INDEX UNIQUE SCAN LOC_ID_PK (cr=4 pr=0 pw=0 time=37 us cost=0 size=3 card=1)(object id 20135)
        27         27         27  TABLE ACCESS FULL DEPARTMENTS (cr=2 pr=0 pw=0 time=58 us cost=2 size=189 card=27)
(...)


If I understand that correctly it is time that DB spent to perform specified operation. In firtst case (Joins) we've got 392 + 112 + 111 = 615, in second (subqueries) 37 + 58 = 95 - about 6 time faster. And that is surprise for me.

[Updated on: Wed, 20 March 2013 02:57]

Report message to a moderator

Previous Topic: dbms_xplan for format model
Next Topic: Full table scan
Goto Forum:
  


Current Time: Fri Nov 28 15:50:00 CST 2014

Total time taken to generate the page: 0.43397 seconds