Re: performance tuning questions: replace IN (values) by JOIN

From: Bosco Ng <boscong_at_leccotech.com>
Date: Sat, 30 Aug 2003 00:31:05 +0800
Message-ID: <3f4f7e22@shknews01>

The 2nd approach, if written properly, would always be the better one.

There is no need for the sub-query, the problem with your plan in the join order.
It now first join all rows of BAGS and BAG_LOG and then filter using the timestamp,
which ideally, it should first do the filtering on BAG_LOG before joining with BAGS.

It might probably because you use the unnecessary sub-query or not update stats or not stats on DTM_TIMESTAMP column (assume you use CBO) or the missing of index on DTM_TIMESTAMP (I would suggest adding index on that) that leads the optimizer to the wrong decision.

It would quite simply be:

SELECT ...
FROM BAG_LOG bl, BAGS b
WHERE bl.DTM_TIMESTAMP >= (TO_DATE('2003-19-08 0:00:00', 'yyyy-dd-mm HH24:MI:SS'))
AND bl.DTM_TIMESTAMP <= (TO_DATE('2003-19-08 2:00:00', 'yyyy-dd-mm HH24:MI:SS'))
AND b.ID = bl.BAG_ID

And proper stats. CBO should find you the right one for such a simple query.

Never use the 1st approach, it just sucks.

"Jérôme Lacoste - CoffeeBreaks" <lacostej_at_coffebreaks.org> wrote in message news:cJI3b.17331$BD3.2757926_at_juliett.dax.net...
> Hi,
>
> I am trying to improve an SQL query design while still keeping its
> performance.
>
>
> DB Model
> --------
> I have 2 tables, one called BAGS the other called BAG_LOG. Each has a
> primary key called ID. The BAG_LOG table has a foreign key called BAG_ID
> pointing to BAGS.ID.
>
> The relation is: a BAG has 1 or more BAG_LOG(s).
>
> The BAG_LOG table has a DTM_TIMESTAMP column.
>
> There are 40000 rows in BAGS and 170000 in BAG_LOG.
>
> There's no specific INDEX appart the ones generated automatically on the
> primary and foreign key.
>
>
>
> Query to improve
> ----------------
>
> Let's say I want to find all BAGS who have had a TIMESTAMP within a
> specific period. Additional criteria will add later on.
>
>
> One solution is to use 3 queries:
>
> select DISTINCT(BAG_ID) FROM BAG_LOG
> WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08
> 0:00:00', 'yyyy-dd-mm HH24:MI:SS'))
> AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08
> 2:00:00', 'yyyy-dd-mm HH24:MI:SS'));
>
> SELECT * FROM BAGS where ID IN
> (346564,346565,346566,346567,346568,346569);
>
> SELECT * FROM BAG_LOG where BAG_ID IN
> (346564,346565,346566,346567,346568,346569);
>
> The first query find the BAG IDs who satisfy the condition, the second
> and third queries find the rows of the BAG and BAG_LOG table that match
> the result of the first query. The second and third query are built
> dynamically by the calling program, based on the results returned by the
> first query.
>
>
> This has two drawbacks:
> - it doesn't exploit SQL capabilities
> - by using IN (list of values) one can hit a limitation if the first
> query returns more than a certain number of rows. (Oracle 9 has a
> default limit of 1000 possible values).
>
> It has one advantage
> - it is very simple SQL and will probably work on all DB I know.
>
>
>
> Possible improvement?
> ---------------------
>
> The 3 queries are joined into one. The columns are renamed so to be able
> to retrieve the values by column name (otherwise identical BAGS and
> BAG_LOG column names conflict).
>
> select BAGS.ID as BAGS__ID, [...],
> BAG_LOG.* FROM BAGS, BAG_LOG
> WHERE (BAGS.ID = BAG_LOG.BAG_ID AND
> BAGS.ID IN (select DISTINCT(BAG_ID) FROM BAG_LOG
> WHERE DTM_TIMESTAMP >= (TO_DATE('2003-19-08
> 00:00:00', 'yyyy-dd-mm HH24:MI:SS'))
> AND DTM_TIMESTAMP <= (TO_DATE('2003-19-08
> 12:00:00', 'yyyy-dd-mm HH24:MI:SS'))
> )
> ) ORDER BY BAGS.ID DESC, BAG_LOG.DTM_TIMESTAMP DESC;
>
> replace [...] by the full list of fields from the BAGS table one wants
> to retrieve. In my case all.
>
> [Note that I also added sorting in that query, but it doesn't seem to
> affect the performance in my case].
>
>
>
> Performance results & analysis:
> -------------------------------
>
> - (1) set of queries takes about 14 sec to perform
> - (2) takes 18 secs to perform, a 25% performance loss
>
>
> Here are some info using the autotrace function:
>
>
> ------ (1) --------
> 6 rows selected.
>
> Elapsed: 00:00:07.01
>
> Execution Plan
> ----------------------------------------------------------
>
> 0 SELECT STATEMENT Optimizer=CHOOSE
>
> 1 0 SORT (UNIQUE)
>
> 2 1 TABLE ACCESS (FULL) OF 'BAG_LOG'
>
>
>
>
>
> Statistics
> ----------------------------------------------------------
>
> 0 recursive calls
>
> 0 db block gets
>
> 34199 consistent gets
>
> 34184 physical reads
>
> 0 redo size
>
> 456 bytes sent via SQL*Net to client
>
> 499 bytes received via SQL*Net from client
>
> 2 SQL*Net roundtrips to/from client
>
> 1 sorts (memory)
>
> 0 sorts (disk)
>
> 6 rows processed
>
>
>
> 6 rows selected.
>
> Elapsed: 00:00:00.03
>
> Execution Plan
> ----------------------------------------------------------
>
> 0 SELECT STATEMENT Optimizer=CHOOSE
>
> 1 0 CONCATENATION
>
> 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 3 2 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
> 4 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 5 4 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
> 6 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
> 8 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 9 8 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
> 10 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 11 10 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
> 12 1 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 13 12 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
>
>
> Statistics
> ----------------------------------------------------------
>
> 0 recursive calls
>
> 0 db block gets
>
> 18 consistent gets
>
> 0 physical reads
>
> 0 redo size
>
> 915 bytes sent via SQL*Net to client
>
> 499 bytes received via SQL*Net from client
>
> 2 SQL*Net roundtrips to/from client
>
> 0 sorts (memory)
>
> 0 sorts (disk)
>
> 6 rows processed
>
>
>
> 12 rows selected.
>
> Elapsed: 00:00:07.07
>
> Execution Plan
> ----------------------------------------------------------
>
> 0 SELECT STATEMENT Optimizer=CHOOSE
>
> 1 0 TABLE ACCESS (FULL) OF 'BAG_LOG'
>
>
>
>
>
> Statistics
> ----------------------------------------------------------
>
> 0 recursive calls
>
> 0 db block gets
>
> 34200 consistent gets
>
> 34185 physical reads
>
> 0 redo size
>
> 1774 bytes sent via SQL*Net to client
>
> 499 bytes received via SQL*Net from client
>
> 2 SQL*Net roundtrips to/from client
>
> 0 sorts (memory)
>
> 0 sorts (disk)
>
> 12 rows processed
>
>
>
>
> ------ (2) ------
>
> 12 rows selected.
>
> Elapsed: 00:00:18.06
>
> Execution Plan
> ----------------------------------------------------------
>
> 0 SELECT STATEMENT Optimizer=CHOOSE
>
> 1 0 SORT (ORDER BY)
>
> 2 1 MERGE JOIN
>
> 3 2 SORT (JOIN)
>
> 4 3 NESTED LOOPS
>
> 5 4 TABLE ACCESS (FULL) OF 'BAG_LOG'
>
> 6 4 TABLE ACCESS (BY INDEX ROWID) OF 'BAGS'
>
> 7 6 INDEX (UNIQUE SCAN) OF 'PK_BAGS1' (UNIQUE)
>
> 8 2 SORT (JOIN)
>
> 9 8 VIEW OF 'VW_NSO_1'
>
> 10 9 SORT (UNIQUE)
>
> 11 10 TABLE ACCESS (FULL) OF 'BAG_LOG'
>
>
>
>
>
> Statistics
> ----------------------------------------------------------
>
> 0 recursive calls
>
> 26 db block gets
>
> 345418 consistent gets
>
> 71578 physical reads
>
> 0 redo size
>
> 2422 bytes sent via SQL*Net to client
>
> 499 bytes received via SQL*Net from client
>
> 2 SQL*Net roundtrips to/from client
>
> 3 sorts (memory)
>
> 1 sorts (disk)
>
> 12 rows processed
>
>
>
>
>
>
> I clearly see that my second solution uses much more memory (consistent
> gets is about 3 times the total used by the first 3 queries), and also
> do 2 more sorts in memory and one in the disk.
>
> But I do not see any obvious design flaw in my query....
>
>
>
> -- Questions --
> ---------------
>
> Q: Is the performance loss due to the NESTED LOOPS? or is it due to the
> JOIN in itself ?
>
> Q: I was wondering what kind of query should I use in order to manage to
> improve the design of my query (1) especially removing its limitations
> without introducing the performance loss as with my second query. Should
> I use a temporary table?
>
> Q: [slightly unrelated] How one can enable the autotrace to also display
> (Cost, Card and Byte) as shown in most of Oracle examples?
>
> Any help appreciated...
>
>
> Cheers,
>
> Jerome
>
Received on Fri Aug 29 2003 - 11:31:05 CDT