Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> performance tuning questions: replace IN (values) by JOIN

performance tuning questions: replace IN (values) by JOIN

From: Jérôme Lacoste - CoffeeBreaks <lacostej_at_coffebreaks.org>
Date: Fri, 29 Aug 2003 13:40:24 GMT
Message-ID: <cJI3b.17331$BD3.2757926@juliett.dax.net>


Hi,

I am trying to improve an SQL query design while still keeping its performance.

DB Model



I have 2 tables, one called BAGS the other called BAG_LOG. Each has a primary key called ID. The BAG_LOG table has a foreign key called BAG_ID pointing to BAGS.ID.

The relation is: a BAG has 1 or more BAG_LOG(s).

The BAG_LOG table has a DTM_TIMESTAMP column.

There are 40000 rows in BAGS and 170000 in BAG_LOG.

There's no specific INDEX appart the ones generated automatically on the primary and foreign key.

Query to improve


Let's say I want to find all BAGS who have had a TIMESTAMP within a specific period. Additional criteria will add later on.

One solution is to use 3 queries:

select DISTINCT(BAG_ID) FROM BAG_LOG

                    WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08 
0:00:00', 'yyyy-dd-mm HH24:MI:SS'))
                      AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08 
2:00:00', 'yyyy-dd-mm HH24:MI:SS'));

SELECT * FROM BAGS where ID IN
(346564,346565,346566,346567,346568,346569);

SELECT * FROM BAG_LOG where BAG_ID IN
(346564,346565,346566,346567,346568,346569);

The first query find the BAG IDs who satisfy the condition, the second and third queries find the rows of the BAG and BAG_LOG table that match the result of the first query. The second and third query are built dynamically by the calling program, based on the results returned by the first query.

This has two drawbacks:
- it doesn't exploit SQL capabilities
- by using IN (list of values) one can hit a limitation if the first query returns more than a certain number of rows. (Oracle 9 has a default limit of 1000 possible values).

It has one advantage
- it is very simple SQL and will probably work on all DB I know.

Possible improvement?


The 3 queries are joined into one. The columns are renamed so to be able to retrieve the values by column name (otherwise identical BAGS and BAG_LOG column names conflict).

select BAGS.ID as BAGS__ID, [...],

        BAG_LOG.* FROM BAGS, BAG_LOG
        WHERE (BAGS.ID = BAG_LOG.BAG_ID AND
        BAGS.ID IN (select DISTINCT(BAG_ID) FROM BAG_LOG
                    WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08 
00:00:00', 'yyyy-dd-mm HH24:MI:SS'))
                      AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08 
12:00:00', 'yyyy-dd-mm HH24:MI:SS'))
                   )
              ) ORDER BY BAGS.ID DESC, BAG_LOG.DTM_TIMESTAMP DESC;

replace [...] by the full list of fields from the BAGS table one wants to retrieve. In my case all.

[Note that I also added sorting in that query, but it doesn't seem to affect the performance in my case].

Performance results & analysis:


Here are some info using the autotrace function:

Elapsed: 00:00:07.01

Execution Plan


    0 SELECT STATEMENT Optimizer=CHOOSE

    1 0 SORT (UNIQUE)     2 1 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics


           0 recursive calls

           0 db block gets

       34199 consistent gets

       34184 physical reads

           0 redo size

         456 bytes sent via SQL*Net to client

         499 bytes received via SQL*Net from client

           2 SQL*Net roundtrips to/from client

           1 sorts (memory)

           0 sorts (disk)

           6 rows processed

6 rows selected.

Elapsed: 00:00:00.03

Execution Plan


    0 SELECT STATEMENT Optimizer=CHOOSE

    1 0 CONCATENATION     2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'     3 2 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)     4 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'     5 4 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)     6 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'     7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)     8 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'     9 8 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)    10 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'    11 10 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)    12 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'    13 12 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE) Statistics


           0 recursive calls

           0 db block gets

          18 consistent gets

           0 physical reads

           0 redo size

         915 bytes sent via SQL*Net to client

         499 bytes received via SQL*Net from client

           2 SQL*Net roundtrips to/from client

           0 sorts (memory)

           0 sorts (disk)

           6 rows processed

12 rows selected.

Elapsed: 00:00:07.07

Execution Plan


    0 SELECT STATEMENT Optimizer=CHOOSE

    1 0 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics


           0 recursive calls

           0 db block gets

       34200 consistent gets

       34185 physical reads

           0 redo size

        1774 bytes sent via SQL*Net to client

         499 bytes received via SQL*Net from client

           2 SQL*Net roundtrips to/from client

           0 sorts (memory)

           0 sorts (disk)

          12 rows processed

12 rows selected.

Elapsed: 00:00:18.06

Execution Plan


    0 SELECT STATEMENT Optimizer=CHOOSE

    1 0 SORT (ORDER BY)     2 1 MERGE JOIN     3 2 SORT (JOIN)     4 3 NESTED LOOPS     5 4 TABLE ACCESS (FULL) OF 'BAG_LOG'     6 4 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'     7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)     8 2 SORT (JOIN)     9 8 VIEW OF 'VW_NSO_1'    10 9 SORT (UNIQUE)    11 10 TABLE ACCESS (FULL) OF 'BAG_LOG' Statistics


           0 recursive calls

          26 db block gets

      345418 consistent gets

       71578 physical reads

           0 redo size

        2422 bytes sent via SQL*Net to client

         499 bytes received via SQL*Net from client

           2 SQL*Net roundtrips to/from client

           3 sorts (memory)

           1 sorts (disk)

          12 rows processed

I clearly see that my second solution uses much more memory (consistent gets is about 3 times the total used by the first 3 queries), and also do 2 more sorts in memory and one in the disk.

But I do not see any obvious design flaw in my query....

Any help appreciated...

Cheers,

Jerome Received on Fri Aug 29 2003 - 08:40:24 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US