Re: union all vs. left outer join

From: VC <boston103_at_hotmail.com>
Date: Tue, 8 Nov 2005 06:17:54 -0500
Message-ID: <hJmdnR_lvOb_FO3eRVn-iQ_at_comcast.com>


<henrik.sorensen_at_balcab.ch> wrote in message news:dkpb3t$fij$1_at_news.hispeed.ch...
> VC wrote:
>
>> <henrik.sorensen_at_balcab.ch> wrote in message
>>> The test case is reduced from a production system where each table have
>>> 200'000 records, and hence the materialization of the tables A-B-C is
>>> performing very very badly.
>>
>> You need to put together a reproducible test case which would demonstrate
>> that the first query is indeed slower than the second. What you've posted
>> does not show that.
> Ok fair enough...
> But even with this little testcase, for the first query when looking at
> the
> explain from both DB2 and Oracle, (and it seems also postgreSQL thanks to
> Stefan Rybacki), the all have problem with the OR join condition. They all
> will materialize the table A-B-C, which in my case means 200'000 records.

What do you mean by 'will materilize' ? There is no materialzation gong on in Oracle:

SQL> SELECT *
  2 from A
  3 left outer join B on A.id = B.aid
  4 left outer join C on A.id = C.aid
  5 inner join D on
  6 d.bid = B.id
  7 or d.bid = C.bid
  8 inner join E on E.id = D.eid
  9 where E.m1 = 'A'
 10 /

Execution Plan


   0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=15 Card=32 Bytes=2240)    1 0 FILTER

   2    1     HASH JOIN (OUTER) (Cost=15 Card=32 Bytes=2240)
   3    2       HASH JOIN (OUTER) (Cost=12 Card=32 Bytes=1696)
   4    3         MERGE JOIN (CARTESIAN) (Cost=8 Card=32 Bytes=1408)
   5    4           TABLE ACCESS (BY INDEX ROWID) OF 'D' (TABLE) (Cost=1 
Card=1 Bytes=8)
   6    5             NESTED LOOPS (Cost=4 Card=2 Bytes=64)
   7    6               TABLE ACCESS (BY INDEX ROWID) OF 'E' (TABLE) (Cost=2 
Card=2 Bytes=48)
   8    7                 INDEX (RANGE SCAN) OF 'XEM1' (INDEX) (Cost=1 
Card=2)
   9    6               INDEX (RANGE SCAN) OF 'XDE' (INDEX) (Cost=0 Card=1)
  10    4           BUFFER (SORT) (Cost=7 Card=16 Bytes=192)
  11   10             TABLE ACCESS (FULL) OF 'A' (TABLE) (Cost=2 Card=16 
Bytes=192)
  12    3         TABLE ACCESS (FULL) OF 'B' (TABLE) (Cost=3 Card=4 
Bytes=36)
  13    2       TABLE ACCESS (FULL) OF 'C' (TABLE) (Cost=3 Card=12 
Bytes=204)

There are just two hash joins, one merge join and one nested loop, that's all.

> These 200'000 records are then joined to the D-E tables.
> For some reason, perhaps perfectly valid reason, the OR condition causes
> this strange behavior. I am looking for either an other way to do the
> query
> with the OR join, or an explanation for the way that this query gets
> interpreted.
>
> For the second query where the OR join is done with 'union all', none of
> the
> have a problem using the indexes to produce the resultsets.
>
> Do you see it differently ?
>

What's important, the execution plans may be substantially different on on different data volumes. As I said, you need to prepare a test case that would have the same exec. plan as the production has and would demonstrate that the second query performs better. Analyzing plans for tiny data sets, whilst fun, is pretty fruiteless.

> Now my problem with the 'union all' vs the 'left outer join', is really
> first of all performance, and also the maintenance of the query. Obviously
> in my production system there are much more columns, and reference tables
> that are joined to the resultset. And further the query is wrapped inside
> a
> view.
>
> Henrik
Received on Tue Nov 08 2005 - 12:17:54 CET

Original text of this message