Re: union all vs. left outer join

From: vc <boston103_at_hotmail.com>
Date: 10 Nov 2005 12:02:18 -0800
Message-ID: <1131652938.674700.38970_at_f14g2000cwb.googlegroups.com>


henrik.sorensen_at_balcab.ch wrote:
"Just to summarize the problem as I see it. The query 1) should be the fastest, and is also much easier to maintain.
Now, the problem OR join, has so far resulted in a table scan of A-B-C, and
not as I expect a indexed lookup on B.id and C.bid respectively. Ideally the optimizer should do something like this: 1) use index E.XM1
2) use index D.XDE

3.1) use index B.XB
3.2) use index B.XBA
4.1) use index C.XC
4.2) use index C.XCA

5) use index A.XA

do you agree with this ?

looking forward to hear your results.

"

Using the data you've sent, Oracle produces the following execution plans:

SELECT a.*,d.*,e.*
from A
left outer join B on A.id = B.aid
left outer join C on A.id = C.aid
inner join D on

   d.bid = B.id
or d.bid = C.bid
inner join E on E.id = D.eid
where E.m1 = 'A'

25824 rows selected.

Elapsed: 00:00:20.03

Execution Plan


   0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=9285 Card=99697 Bytes=8374548)

   1 0 CONCATENATION

   2    1     HASH JOIN (OUTER) (Cost=4832 Card=52143 Bytes=4380012)
   3    2       HASH JOIN (Cost=4081 Card=52143 Bytes=3806439)
   4    3         TABLE ACCESS (FULL) OF 'E' (TABLE) (Cost=201
Card=15630 Bytes=422010)
   5    3         HASH JOIN (Cost=2709 Card=358780 Bytes=16503880)
   6    5           TABLE ACCESS (FULL) OF 'D' (TABLE) (Cost=142
Card=263420 Bytes=2897620)
   7    5           HASH JOIN (RIGHT OUTER) (Cost=956 Card=358780
Bytes=12557300)
   8    7             TABLE ACCESS (FULL) OF 'C' (TABLE) (Cost=84
Card=81910 Bytes=1802020)
   9    7             TABLE ACCESS (FULL) OF 'A' (TABLE) (Cost=225
Card=358780 Bytes=4664140)
  10    2       TABLE ACCESS (FULL) OF 'B' (TABLE) (Cost=142
Card=288840 Bytes=3177240)
  11    1     FILTER
  12   11       HASH JOIN (RIGHT OUTER) (Cost=4453 Card=47554
Bytes=3994536)
  13   12         TABLE ACCESS (FULL) OF 'C' (TABLE) (Cost=84
Card=81910 Bytes=1802020)
  14   12         HASH JOIN (Cost=4023 Card=47554 Bytes=2948348)
  15   14           TABLE ACCESS (FULL) OF 'E' (TABLE) (Cost=201
Card=15630 Bytes=422010)
  16   14           HASH JOIN (Cost=2946 Card=327205 Bytes=11452175)
  17   16             TABLE ACCESS (FULL) OF 'D' (TABLE) (Cost=142
Card=263420 Bytes=2897620)
  18   16             HASH JOIN (RIGHT OUTER) (Cost=1469 Card=358780
Bytes=8610720)
  19   18               TABLE ACCESS (FULL) OF 'B' (TABLE) (Cost=142
Card=288840 Bytes=3177240)
  20   18               TABLE ACCESS (FULL) OF 'A' (TABLE) (Cost=225
Card=358780 Bytes=4664140)

Statistics



..
       6872  consistent gets
      12035  physical reads

..

and :

SELECT a.*,d.*,e.*
from A
left outer join B on A.id = B.aid
inner join D on

   d.bid = B.id
inner join E on E.id = D.eid
where E.m1 = 'A'
union all
SELECT a.*,d.*,e.*
from A
left outer join C on A.id = C.aid
inner join D on
  d.bid = C.bid
inner join E on E.id = D.eid
where E.m1 = 'A'

25824 rows selected.

Elapsed: 00:00:18.08

Execution Plan


   0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7724 Card=99697 Bytes=6185803)

   1 0 UNION-ALL

   2    1     HASH JOIN (Cost=3961 Card=47554 Bytes=2900794)
   3    2       TABLE ACCESS (FULL) OF 'E' (TABLE) (Cost=201 Card=15630
Bytes=422010)
   4    2       HASH JOIN (Cost=2902 Card=327205 Bytes=11124970)
   5    4         TABLE ACCESS (FULL) OF 'D' (TABLE) (Cost=142
Card=263420 Bytes=2897620)
   6    4         HASH JOIN (RIGHT OUTER) (Cost=1449 Card=358780
Bytes=8251940)
   7    6           TABLE ACCESS (FULL) OF 'B' (TABLE) (Cost=142
Card=288840 Bytes=2888400)
   8    6           TABLE ACCESS (FULL) OF 'A' (TABLE) (Cost=225
Card=358780 Bytes=4664140)
   9    1     HASH JOIN (Cost=3763 Card=52143 Bytes=3285009)
  10    9       TABLE ACCESS (FULL) OF 'E' (TABLE) (Cost=201 Card=15630
Bytes=422010)
  11    9       HASH JOIN (Cost=2585 Card=358780 Bytes=12916080)
  12   11         TABLE ACCESS (FULL) OF 'D' (TABLE) (Cost=142
Card=263420 Bytes=2897620)
  13   11         HASH JOIN (RIGHT OUTER) (Cost=1082 Card=358780
Bytes=8969500)
  14   13           TABLE ACCESS (FULL) OF 'C' (TABLE) (Cost=84
Card=8910 Bytes=982920)
  15   13           TABLE ACCESS (FULL) OF 'A' (TABLE) (Cost=225
Card=358780 Bytes=4664140)

Statistics



..
       5901  consistent gets
      10669  physical reads

...

The plans are pretty similar with the 'union all' being slightly more efficient (by about 15% judging by the execution times). Both perform a number of hash [outer] joins without any index access. The second plan is slightly better because with your specific data you knew that you could transform (AB+)C+DE into (AC+)DE union all (AB+)DE and save one join operation. In general, such transformation would produce different results.

Now, as to you questions:

Having indexes does not necessarily mean that the optimizer will use them with the queries like that. E.g. a simple query like

select * from a join b on a.id=b.aid

will result in two full table scans even though there indexes on a.id and b.aid because there's no point of going to the index and then to the respective row if we need to scan all the rows anyway. Similarly, it's faster to have a full scan of A because the index selectivity is low: with m1='A', you'll get 15K rows out of 400K. If the index selectivity were better the optimizer would use it. So, with your queries, the optimizer evaluated various access paths and chose ful table scans as less expensive in terms of IO and CPU. Received on Thu Nov 10 2005 - 21:02:18 CET

Original text of this message