Re: SQL Where clause order of operations?

From: Dave.Wotton <Dave.Wotton_at_no-spam.it.camcnty.gov.uk>
Date: 12 Jul 1998 14:13:00 +0100
Message-ID: <6oacos$ldc@dns.camcnty.gov.uk>

> From: drpaner_at_intrex.net
> Date: Fri, 10 Jul 1998 20:41:03 GMT
>

[ snip ]
>
>Question: How does Oracle interpret the Where clause criteria? Are table
>joins completed first or are non-table joins completed first?
>

Hi Daniel,

The optimiser does not apply the filter conditions last, in fact it applies the filter conditions as soon as it can ( but this does not necessarily mean first! )

There are a lot of myths about how the optimisers work, but at the risk of inadvertantly adding a few more, this is my understanding about what happens.

Note that these comments apply to the rule-based optimiser, or the cost-based optimiser where the tables haven't been analysed and there are no hints. A correctly configured cost-based optimiser works in a completely different way.

First, the optimiser must choose a driving table. It does this by first considering the predicates in the WHERE clause. ( There are two types of conditions in a WHERE clause: predicates and joins. A predicate is a comparison of a column against a value or range, joins should be self-evident ). In your case, you have two predicates: S.SALES_WEEK <= AI_WEEKBOUNDARY and Q.SALES_WEEK IS NULL.

The optimiser ranks predicates: eg. a comparison of a uniquely-indexed column against a single value is rated higher than a comparison of an indexed column against a range ( eg. aaa LIKE 'FRED%' ), which is rated higher than a condition on an unindexed column etc. There are various references which describe the order of rankings. Once the predicate with the highest ranking has been identified, the table to which it relates is used as the driving table. If there are two or more predicates with the same ranking, then the order of the tables in the FROM clause is used to choose between them, with the one placed last taking precedence. ( It is this feature which leads to the common confusion that the order of the tables in the FROM clause itself determines which is the driving table. It doesn't, it's the second consideration. ) In your case, your condition S.SALES_WEEK <= AI_WEEKBOUNDARY has highest ranking, so table INC_SALES_SUMMARY is used as the driving table.

The predicate is then used to select matching rows from the table, and any other filter conditions ( with some exceptions, like un-correlated subqueries ) on that table are applied before passing the result set onto the join operator to join to the next table.

You can see this if you generate a trace file and analyse it with tkprof. For your query, you should get something like ( I'm guessing about index names and made up the numbers of rows returned ):

Rows Execution Plan

-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: RULE
    753   FILTER                                          <= (2)
   2126    NESTED LOOPS (OUTER)
   4500     TABLE ACCESS (FULL) OF 'INC_SALES_SUMMARY'    <= (1)
   2126       TABLE ACCESS (BY ROWID) OF 'INC_QUOTA'
   2127         INDEX (RANGE SCAN) OF 'INC_QUOTA1' (UNIQUE)

Look at the first column, which is the number of rows returned by each operation.

(1) Table INC_SALES_SUMMARY was chosen as the driving table. A full

table scan was performed ( I'm assuming no index on S.SALES_WEEK ), which returned 4500 rows, but only 2127 index searches were performed on the next table. This is because 2373 rows were filtered out by your condition: S.SALES_WEEK <= AI_WEEKBOUNDARY

(2) In the explain plan output, filter operations, like the one above

are normally implicit. You usually only see the final FILTER operation shown explicitly, when there's no other operation to pass the result set on to. ( It's this that causes the confusion that filter operations are performed last. It's not true: you only *see* the last one. ) In this case, the join returned 2126 rows, and the condition Q.SALES_WEEK IS NULL filtered out all but 753 of them. You might find it beneficial to restructure your sql so that the INC_QUOTA table is queried first, and the Q.SALES_WEEK is null condition is applied before joining to INC_SALES_SUMMARY, but then the S.SALES_WEEK <= AI_WEEKBOUNDARY condition will be applied later. It's swings and roundabouts, and until I've seen a real explain plan with real values for number of rows in it, I can't tell which is best.

Finally, I've seen the other postings about enclosing the conditions in brackets. From my understanding, I don't think this will make any difference, but I'm willing to stand corrected.

Hope this helps,

Dave.

--
Remove the .nospam bit from my address to reply by email. Received on Sun Jul 12 1998 - 08:13:00 CDT