Re: Join order and intermediate results

From: Dan Tow <dantow_at_singingsql.com>
Date: Fri, 1 Oct 2004 13:14:54 -0500
Message-ID: <1096654494.415d9e9ec0e22@www.singingsql.com>

I see that Mark Farnham already did a very nice job of answering the question of how to accomplish this, so I'll focus on the question of "Why?". As it turns out, the need for this sort of unusual join-tree is quite rare. As you already noticed, John, Oracle has a strong tendency to join tables one-at-a-time, something like A to B to get (AB), followed by (AB) to C to get (ABC), followed by (ABC) to D to get (ABCD),..., and with normal hints like ORDERED (without using inline-view tricks, that is), all you can force is which table is "A" (first), which is "B" (second), which is "C" (third), etc., in this one-at-a-time join style.

So, let's examine what normally happens with a 4-way join where you might want to join A to B to make (AB), C to D to make (CD), followed by a join (presumably a hash or a sort-merge join, since there can't be a useful nest-loops path into a pre-joined result) of (AB) to (CD):

In a normal join tree, one of these 4 tables will be the "detail" table of the whole query, and joins will reach the other 3 tables by three detail-to-master joins that begin from that table. Since we can assume there's a join between A and B and another join between C and D, we can assume, without any significant loss of generality that A is the detail table, and that the joins look either like

A.Fkey1=B.Pkey1 AND
C.Fkey2=D.Pkey2 AND
A.FKey3=C.PKey3

A.Fkey1=B.Pkey1 AND
C.Fkey2=D.Pkey2 AND
B.FKey3=C.PKey3

Now, assuming, again without loss of generality, that A or B is the detail table, and that table is large enough that optimization is an issue, it is almost invariably the case that you are going to need an indexed path such that the number of logical I/Os to achieve (AB) is at least, roughly, as high as (and quite possibly several times higher than) the number of rows in the join result (AB) (since each result row will probably map to a specific table-access by ROWID, not even counting other indexed accesses and the table access for the other table). So, if (AB) costs N logical I/Os, it is very unlikely to contain more than N rows. However, if it contains N rows, and if all the primary keys are indexed, then there has to be a nested-loops path from (AB) to C and D *individually* that does *at most* 2N index unique scans and 2N table accesses by ROWID, *without* pre-joining C and D. Since an index unique scan is very unlikely to require more than 4 (usually less) logical I/Os, and a table access by ROWID requires precisely 1 logical I/O, the entire worst-case plan that drives from (AB) to C and D individually should have an upper-bound cost of 11N logical I/Os if (AB) cost N logical I/Os. Furthermore, the ratio of *runtime* is probably even better (a lot better, usually) than 11-to-1 between the whole cost and the cost of (AB), because, A or B being the detail table, and therefore probably the biggest table, you are more likely to see physical I/O during the reads of A and B than during the reads of C and D. Even if the join (CD) was *free*, that would mean that the best one-at-a-time plan in the style A-to-B-to-C-to-D or A-to-B-to-D-to-C should cost no more than 11 times as much as the best plan

(A-to-B)-to-(C-to-D)

Keep in mind that all this is true even restricting ourselves nested-loops joins to C and D. When we allow for hash joins (one at a time, only where they are best) to C and/or D, the likelihood of dramatic advantages for (A-to-B)-to-(C-to-D) grows even smaller.

When you factor in all the practical details, the probable ratio of the best plan in the one-at-a-time style to the best plan that first pairs tables off is *usually* <= one (one-at-a-time wins or ties), and *very* rarely more than 2 or 3. In fact, I've known of the possibility of plans joining already-joined results for years, and it's a trick I haven't needed even once (which is why I didn't talk about it in my book) to get an excellent result in real-world tuning.

So, I'm fascinated that you appear to have found a case where the sum of the costs of (AB) and (CD) was 85 logical I/Os (implying either very small tables or excellent indexed paths to less than 85 rows for both joins), but did not find a one-at-a-time path that cost less than (or even close to!) 935 logical I/Os (11*85). Without going into too much more tedious detail, even unusual join trees that do not have a single detail table, as long as all joins have a primary key on one end, and point to a single table on both ends, also obey this 11-to-1 upper bound (between the best one-at-a-time cost and the best (AB)-to-(CD) cost) if we allow for hash joins to individual tables. I can only see a few possibilities:

-You're missing at least one primary-key index.

-You didn't find the *best* one-at-a-time join order, and the best one would cost less than 935 logical I/Os, and run in less than a second.

-You have an unusual join tree, where at least one of the joins is many-to-many, or some even more unusual feature/ I discuss most cases of unusual join trees in my book , and they usually indicate a functional mistake in the query, though there are of course exceptions to that rule.

Thanks,

Dan Tow
650-858-1557
www.singingsql.com

Quoting Smiley John - IL <SMILEYJ_at_tusc.com>:

> Suppose we have four tables A, B, C, and D that are all used in a SELECT
> statement. Let's further suppose that we know that the best way to join
> these tables (best meaning fastest RT, and fewest LIOs) is to join A to B to
> produce row source AB, join C to D to produce row source CD, then join AB to
> CD.
>
> How do I get Oracle to do this? At first I simply wrote the query as a
> simple SELECT like this:
>
> SELECT ...
> FROM A, B, C, D
> WHERE ...
>
> When that didn't work and I made sure the table, index, and column stats
> were correct, I tried various hints for join order and join method. When
> that didn't work, I decided to smack Oracle in the face with the answer like
> this:
>
> SELECT ...
> FROM (SELECT ... FROM A, B WHERE ...) AB, (SELECT ... FROM C, D WHERE ...)
> CD
> WHERE ...
>
> However, it stubbornly refused to join in the correct order and I had to
> result to storing the intermediate results in GTTs and then joining the
> GTTs.
>
> This is not a very satisfying means of solving the problem. Oracle seems to
> doggedly follow a linear path for satisfying the query, which goes something
> like this: join two row sources to create an intermediate result, pick
> another row source and join it to the intermediate result to produce a new
> intermediate result, repeat until done (in the case of parallel query, each
> PQ slave seems to follow this same approach and then the query coordinator
> merges the results).
>
> Expressed as a tree, it looks like this (where / | and \ are join
> operations):
>
> D (or C)
> /
> C (or D)
> /
> A
> |
> B
>
> When what I want is this:
>
> Result
> / \
> A C
> | |
> B D
>
>
> In the case I'm describing, this makes the difference between a response
> time of 0.1 seconds with 85 LIOs vs. a response time of 120 seconds with
> 60,000 LIOs.
>
> I have a feeling I'm going to be smacking my forehead on this one, but I
> just don't see what I'm missing.
>
> John Smiley
>
> P.S. This is 9.2.0.4
>
> --
> http://www.freelists.org/webpage/oracle-l
>

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Oct 01 2004 - 13:10:28 CDT