Re: Outer Joins are Evil? - NO, with detailed discussion

From: Dan Tow <dantow_at_singingsql.com>
Date: Tue, 31 Aug 2004 12:33:30 -0500
Message-ID: <1093973610.4134b66aef444@www.singingsql.com>

I strongly believe the whole business of outer joins being more expensive than inner joins is simply a *myth*, albeit a myth based on a bit of mostly-outdated truth behind it:

-It is certainly true that the now-outdated Oracle rule-based optimizer has at least one important bug that makes it harder to get correct execution plans for a (relatively small) subset of outer-join queries. Specifically, the RBO does not attempt OR-decomposition in the same query block as an outer join, so, for example

select ... from emp e, dept d where e.deptno=d.deptno(+) and e.empno in (1,2)

will not (on the RBO) follow the index into e.empno (OR-depcomposed into e.empno=1 OR e.empno=2) followed by nested loops into dept on deptno. Where the indexes are selective enough, and the tables big enough, this was a major problem in the RBO, though it had workarounds.

-It is certainly true that a silly-and-unnecessary outer join can prevent a join order that is best. For example,

select ... from emp e, dept d where e.deptno(+)=d.deptno and e.empno=1

will generally not reach emp first on the primary key empno, followed by a nested loop to dept, since it is on the "(+)" side of the outer join. (The optimizer could deduce that the outer join can be made inner, since the condition "e.empno=1" can only be true in the inner case, anyway. I know that older versions of Oracle couldn't do this, though. I confess I haven't tried it on newer versions - it just hasn't come up - anyone out there know if the optimizer automates this conversion, now?) This sort of outer join is generally not necessary, though, since it is better to make the outer join an inner join in a case like this where you clearly don't intend to get the outer case, anyway. (I grant that this is inconvenient in the case of views, though - a view that does the join "e.deptno(+)=d.deptno" will not be efficently usable in a query on "empno=1" until this has been fixed in the optimizer (if it hasn't). In such a case, you'd need to use a similar view with the join made inner, but that would defeat some of the code-reduction goal of using views extensively. (I point out in the book, though, that there are *many* performance penalties to over-free use of views, especially complex views, so elimination of outer joins still will not free you to use views indiscriminately.)

-DB2 had an even worse bug that resulted in super-long parses that finally errored out after about a half hour when you had too many (more than about 15) outer joins in the same query block. I expect this is fixed in more-recent versions of DB2, but I haven't used these, so I'm not sure. I describe the (rather complex) workaround for this bug on page 101 of my book (SQL Tuning, O'Reilly).

I've never seen any evidence that outer joins perform badly except where they were affected by at least one of the above issues. (I'm happy to hear of measured, empirical evidence to prove me wrong, though. I think a sufficient proof would be two queries that get essentially the *same* execution plans (except for addition of "OUTER" to the join methods) against the same data, where the outer-joined version is measurably, reproducibly slower in repeatable tests that give no caching advantage to the inner-joined version.)

*In theory, at least*, there's not a single reason why outer joins should be slower than inner joins with artificial default keys and rows for what would have been the outer cases. *On the contrary*, the outer joins should be easier! In particular, where a nested-loops join is best, the outer join will generally find a NULL in the foreign key pointing toward the "(+)" side of the join, and will not need any logical I/O at all to perform the join. In contrast, the inner-join workaround design will find a real, non-null foreign key that points to a real row of the other table (for example, a special "This_employee_has_no_department" department created to avoid the outer case of what you'd otherwise have to express as "e.deptno=d.deptno(+)"). That row will surely be well-cached if the outer case is common, but you'll still need to do logical I/O (with CPU costs) to both the index and the table to reach it, repeatedly. Even worse, super-high-access blocks such as the index and table blocks holding these default rows, where the outer case (which you are eliminating) would have been common, can themselves be serialization points, with bottlenecks on latches controlling access to the shared memory holding these blocks. With hash and sort-merge joins (*where these are optimal*) the difference between the inner and outer versions should virtually vanish - the outer version must read a single extra does-not-apply default row for the table on the "(+)" side, *once*, and that's unlikely to cost enough to measure, much less to matter.

Even parses of outer-joined queries *should* be easier, correctly implemented, because the optimizer can know with dead certainty that an outer join to a primary key (followed by no filter on the outer-joined table, as is usual) will result in *precisely* as many rows after the join as it had before the join, and this greatly simplifies the question of where to put these joins in the join order, and how many permutations in join-order need be examined.

Thanks,

Dan Tow
650-858-1557
www.singingsql.com

Quoting "Barr, Stephen" <Stephen.Barr_at_bskyb.com>:

> I'm currently working on a datawarehouse project (~5 Tb) where the decision
> has been made to avoid performing outer joins.
>
> The reasons given for this seem to be -
>
> 1. Simplifies user navigation of the structures - i.e. avoids outer
> joins.
> 2. Outer joins are slow and should be avoided at all costs.
> 3. If an FK is missing it is populated with a default value which will
> relate to an actual row in the target table, hence no rows will ever be
> dropped - again, supposedly this is to simply SQL and avoid outer joins.
>
> What they actually do is populate each table in the structure with three
> default rows with an SK of 0, 1 & 2. Any FK's which are missing, not
> applicable or invalid will point to one of these rows.
>
> Now....my question is....what is so inherently evil about outer joins that
> we go to this extreme to avoid them?
>
> AND...has anyone else seen something like this deployed in other places?
>
> Thanks,
>
> Steve.
>
>
>
>
> -----------------------------------------------------------------------
> Information in this email may be privileged, confidential and is
> intended exclusively for the addressee. The views expressed may
> not be official policy, but the personal views of the originator.
> If you have received it in error, please notify the sender by return
> e-mail and delete it from your system. You should not reproduce,
> distribute, store, retransmit, use or disclose its contents to anyone.
>
> Please note we reserve the right to monitor all e-mail
> communication through our internal and external networks.
> -----------------------------------------------------------------------
>
>
>
> ----------------------------------------------------------------
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> ----------------------------------------------------------------
> To unsubscribe send email to: oracle-l-request_at_freelists.org
> put 'unsubscribe' in the subject line.
> --
> Archives are at http://www.freelists.org/archives/oracle-l/
> FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
> -----------------------------------------------------------------
>

Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.

--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Received on Tue Aug 31 2004 - 12:31:16 CDT