Efficiency implications of outer joins

From: Dave Wotton <Dave.Wotton_at_it.camcnty.no-spam.gov.uk>
Date: 1998/02/25
Message-ID: <6d1e5f$7qu@dns.camcnty.gov.uk>#1/1

Hi all,

This is not a problem, but simply a posting for discussion. ( Please do not send me "solutions" or alternative techniques! ) I am writing a brief guideline for our programmers on the efficiency implications of outer joins. The guideline follows:

Outer joins affect the performance of queries in two ways. One is obvious and expected, the other is not.

The obvious implication of outer joins is that, since a row is returned for every row in the driving table, an outer join will not reduce the number of rows passed through to the next join condition in the same way that a standard join would.

A much more significant effect of outer joins is that it forces the query to select the non-deficient table as the driving table. This may be undesirable. Consider the following example:

           SELECT ORDER.CUSTOMER_NO,
                  ORD_LINE.AMOUNT
             FROM ORDER, ORDER_LINE
            WHERE ORDER.ORDER_NO = ORD_LINE.ORDER_NO(+)

This prints out orders, displaying the customer number ( from the ORDER table ) and the amount for each line ( from ORD_LINE ), and includes orders which have no lines. The query is parsed as a nested-loops construct, with the ORDER table as the driving table.

Suppose we extend the query as follows:

           SELECT ORDER.CUSTOMER_NO,
                  ORD_LINE.AMOUNT
             FROM ORDER, ORDER_LINE
            WHERE ORDER.ORDER_NO = ORD_LINE.ORDER_NO(+)
              AND ORD_LINE.ITEM_NO = 7

The query will still be driven by ORDER. This is particularly distressing if there are thousands of orders but the index on ORDER_LINE.ITEM_NO was very selective and would have reduced the number of rows processed had it been used.

Note that it is pointless implementing an outer join on a table if there are other conditions on the deficient table, as in this example. The effect of an outer join is to return additional rows from the driving table which have no matching row on the driven table. When this happens, the dummy row from the driven table is populated with nulls. If there is an additional condition on the driven table ( ORD_LINE.ITEM-NO = 7, in the above example ), then that will eliminate the additional rows which the outer join supplied. When there additional conditions on the driven table, the outer join can be safely replaced by a standard join. The optimiser will then have the flexibility to select the other table as the driving table and so improve the performance of the query, often quite dramatically.

Usually, the situation is not as obvious as in the above example. Outer joins are often hidden in views. For example:

        CREATE VIEW ORDER_VIEW AS
           SELECT ORDER.CUSTOMER_NO,
                  ORD_LINE.AMOUNT
             FROM ORDER, ORDER_LINE
            WHERE ORDER.ORDER_NO = ORD_LINE.ORDER_NO(+)

        SELECT CUSTOMER_NO, AMOUNT
          FROM ORDER_VIEW
         WHERE ITEM_NO = 7

The SELECT statement in the above example will generate the same very inefficient execution plan as the SELECT statement in the previous example.

My question: since there's a fairly simple way of deciding when it's safe to replace an outer join by a standard one, enabling the optimiser to have much more flexibility in choosing an efficient access path, why doesn't the optimiser do it itself? Or have I missed something? ( We're currently using Oracle 7.3.3 )

Dave.

-- 

To reply by email, remove the "no-spam" bit from my email address.

Received on Wed Feb 25 1998 - 00:00:00 CST