union all vs. left outer join

From: <henrik.sorensen_at_balcab.ch>
Date: Mon, 07 Nov 2005 22:22:52 +0000
Message-ID: <dkoca9$ohi$1_at_news.hispeed.ch>

Warning long mail ...

For several days I have tried to figure out why two sql with an union all is much faster than using a single sql with left outer join. But I cannot get my head around this one. It is quite possible I am overlooking something simple...

I have tried the following sql on both DB2 and Oracle, and didn't get the expected result in any of the cases. The explain of the query looked remarkable alike, so maybe there is something wrong with my understanding of how the joins should work.

When I look at the explain for this query:

from A
left outer join B on A.id = B.aid
left outer join C on A.id = C.aid
inner join D      on
   d.bid = B.id
or d.bid = C.bid
inner join E      on E.id = D.eid
where E.m1 = 'A'

it presents me with the unforunate news that the A-B-C will be materialised using index scans and later joined with D-E. So at least this is why the query is very slow.

I would have expected to see the result of the D-E table to be used as the basis for the whole select.

There is an important fact missing from the query: One 'A.id' can be either in B or C, but never in both B and C.

I have tried many combinations including explicit checking if B.id or C.bid is null etc, but it didn't give the expected result.

When I split the query into these two with union all,

SELECT a.*,d.*,e.*
from A
left outer join B on A.id = B.aid
inner join D      on
   d.bid = B.id
inner join E      on E.id = D.eid
where E.m1 = 'A'

union all

SELECT a.*,d.*,e.*
from A
left outer join C on A.id = C.aid
inner join D      on
  d.bid = C.bid
inner join E      on E.id = D.eid
where E.m1 = 'A'

The explain looks really good for each select, and do work as I would expect.

If you want to try it out here is  a test case with table definitions and test data.

create table E(id integer not null
, m1 char(1)
, m2 char(20)
, primary key(id));

create unique index XE on E(id);
create index XEM1 on E(m1);

create table A (id integer not null
, n char(8)
, primary key(id));

create unique index XA on A (id);

create table B(id integer not null
, aid integer
, s char(1)
, primary key(id)
, foreign key (aid) references A );

create unique index XB on B (id);
create unique index XBA on B(aid);

create table C(id integer not null
, bid integer
, aid integer
, seq integer
, x float
, primary key(id)
, foreign key (bid) references B
, foreign key (aid) references A );

create unique index XC on C (id);
create unique index XCA on C(aid);
create index XCB on C(bid);

create table D(bid integer not null
, eid integer
,  k char(1)
, primary key(bid)
, foreign key (bid) references B
, foreign key(eid) references E);

create unique index XD on D(bid);
create index XDE on D(eid);

Test data

delete from C;
delete from D;
delete from B;
delete from A;
delete from E;

insert into E (id,m1,m2) values(100000,'A','FIRST-A');
insert into E (id,m1,m2) values(100100,'B','FIRST-B'); insert into E (id,m1,m2) values(100200,'B','SECOND-B');
insert into A (id,n) values(200001,'IN B-001');
insert into A (id,n) values(200002,'IN B-002');
insert into A (id,n) values(200003,'IN B-003');
insert into A (id,n) values(200004,'IN B-004');

insert into A (id,n) values(200201,'IN C-201');
insert into A (id,n) values(200202,'IN C-202'); insert into A (id,n) values(200203,'IN C-203');
insert into A (id,n) values(200211,'IN C-211');
insert into A (id,n) values(200212,'IN C-212');
insert into A (id,n) values(200213,'IN C-213');

insert into A (id,n) values(200221,'IN C-221');
insert into A (id,n) values(200222,'IN C-222');
insert into A (id,n) values(200223,'IN C-223');
insert into A (id,n) values(200224,'IN C-224');
insert into A (id,n) values(200225,'IN C-225'); insert into A (id,n) values(200226,'IN C-226');
insert into B (id,aid,s) values(300001,200001,'O');
insert into B (id,aid,s) values(300002,200002,'P');
insert into B (id,aid,s) values(300003,200003,'Q');
insert into B (id,aid,s) values(300004,200004,'R');

insert into C (id,bid,aid,seq,x) values(400001,300001,200201,1,-1);
insert into C (id,bid,aid,seq,x) values(400002,300001,200202,2,-1); insert into C (id,bid,aid,seq,x) values(400003,300001,200203,3,-1);
insert into C (id,bid,aid,seq,x) values(400004,300002,200211,1,-1);
insert into C (id,bid,aid,seq,x) values(400005,300002,200212,2,-1);
insert into C (id,bid,aid,seq,x) values(400006,300002,200213,3,-1);

insert into C (id,bid,aid,seq,x) values(400007,300003,200221,1,-1);
insert into C (id,bid,aid,seq,x) values(400008,300003,200222,2,-1);
insert into C (id,bid,aid,seq,x) values(400009,300003,200223,3,-1);
insert into C (id,bid,aid,seq,x) values(400010,300003,200224,4,-1);
insert into C (id,bid,aid,seq,x) values(400011,300003,200225,5,-1); insert into C (id,bid,aid,seq,x) values(400012,300003,200226,6,-1);
insert into D (bid,eid,k) values(300001,100000,'Y');
insert into D (bid,eid,k) values(300002,100100,'Y');
insert into D (bid,eid,k) values(300003,100100,'Y');
insert into D (bid,eid,k) values(300004,100200,'Y');

Any ideas, or hints would be much appreciated.

Henrik Received on Mon Nov 07 2005 - 23:22:52 CET

Original text of this message