Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Join order and intermediate results

Re: Join order and intermediate results

From: Dan Tow <>
Date: Sun, 3 Oct 2004 19:22:47 -0500
Message-ID: <>


You're most welcome. I have some responses below, in-line.


Dan Tow

Quoting Smiley John - IL <>:

> Dan,
> Thanks for taking the time to provide such a detailed explanation. I'm
> probably missing something, but it appears that this explanation of how
> one-at-a-time (linear) join orders can match (or nearly so) the performance
> of other join orders is based on the assumption that the tables have a
> master-detail relationship of some kind. Queries in a data warehouse rarely
> have this property.

It has not been my experience, for even the largest queries I've dealt with, that the master-detail rule somehow fails to hold for data-warehouse queries. Admittedly, most (but not all) of my experience with data-warehouse-type queries ahs been with very large, batch queries running against mixed-use databases, rather than pure data-warehouse applications, so my set of dataopints surely differs from yours. I think I have pretty good theoretical reason to believe that a well-designed data warehouse application would tend to avoid these, as well, though, as I decribe below.

> In this particular case, joining A to B gave a
> relatively small result set and joining C to D gave a relatively small
> result set, so that joining AB to CD was very efficient.
> The other combinations were either very inefficient or had no semantic use:
> * A cannot be joined to C or D (other than a meaningless Cartesian
> product) because they have no columns in common
> * D cannot be joined to A or B (other than a meaningless Cartesian
> product) because they have no columns in common
> * B joined to C without filtering B through A and C through D gave an
> enormous intermediate result set

OK, now to the theoretical argument for why we should tend to avoid these many-to-many joins, using your case as an example. I'll make a couple of guesses about the undescribed details of your case just to make this concrete, but alternate examples consistent with the information you provide should look much the same:

You have a normal join from A to B, which I'll guess is in the direction (toward the primary key of) B, with a super filter on A, such that the join of A and B has at most a few hundred rows:

A (Fa<<1)

with a join that looks like A.Fkey1=B.Pkey1. Both A and B are very large (likely A being largest, since it is the detail table), but the filter condition on A is so selective that the filter ratio on A is a very small fraction, maybe Fa=0.0001 or less, resulting in a result set for (AB) that is Fa*Ca, where Ca is the cardinality of A. (This assumes, as is usually the case, that the join itself, from A to B is "non-filtering.")

Likewise, you have a normal join from C to D, which I'll assume is in that direction, which also returns at most a few hundred rows:

D (Fd<<1)

with a join that looks like C.Fkey2=D.Pkey2. Both C and D are very large (likely C being largest, since it is the detail table), but the filter condition on D is so selective that the filter ratio on D is a very small fraction, maybe Fd=0.0001 or less, resulting in a result set for (AB) that is Fd*Cc, where Cc is the cardinality of C. (This assumes, as is usually the case, that the join itself, from C to D is "non-filtering.")

Finally, you have a many-to-many join between B and C:


on some many-to-many match like B.Colx=C.Colx, where the join columns are not even close to unique on either end of the join:

A (Fa<<1)

   D (Fd<<1)

I agree that in this scenario, you're better off doing as you are doing, a many-to-many join of (AB) and (CD), rather than following a one-at-a-time path from A to B to C to D (if Fa<Fd, generally), or D to C to B to A (if Fd<Fa, generally). If I were writing the application, though, I'd opt for a third alternative:

Let's make the problem concrete by guessing that (AB) and CD), with the great filters on A and D, each return 30 rows by themselves. If that's the case, then your full query will return anywhere between 0 rows (if the set of B.Colx values

in (AB) has no overlap with the set of C.Colx values) and 900 rows (if all rows
in (AB) have a single B.Colx value, and it is the same as C.Colx for every row
in (CD)). In every case, the set of combinations (AB) and combinations (CD),
queried separately, and perhaps sorted by Colx for convenience, has no less information than the result of your full query, but the upper limit of the total number of rows returned by a query of (AB) separate from a query of (CD) is 60, in my example, instead of 900, and even if you get lucky, and find few many-to-many matches after applying the filters, you'll still do just as much work in the database (if you optimize as you have) querying ABCD together as you would querying (AB) and (CD) separately.

In the single-query result, for any given value of Colx, you have a Cartesian product of all the rows in (AB) having that value, and all the rows of (CD) having that value, but the Cartesian product reveals absolutely nothing more than the two sets, (AB) and (CD), queried separately, and sorted for convenience by Colx, with perhaps a post-query step to discard rows having Colx values not found in the other set.

Any time you have a many-to-many join, you have potential for this sort of Cartesian-product-generated redundancy, and it is rarely useful, in my experience. More often, developers fail to even test for a case where the Cartesian product is really many-to-many (for example, only testing cases where (AB) or (CD) happen to be unique on Colx after the very powerful filters are applied. In these cases, developers often assume that, like their test cases, rows of the Cartesian product will map one-to-one with some entity, for example, the entity that maps to table A, and will have application logic that fails in scary ways when it turns out that sometimes the query returns multiple rows for a given "A" entity/row. When the application explicitly breaks the logic up to handle (AB) and (CD) as separate queries (which have no failure to map to a table entity, having only something-to-one joins, and explicitly handles the combinations of these two sets for any given Colx, then this sort of logic mistake becomes much less likely.

> One could argue that the data model was not suited to this query, but that
> doesn't solve the problem and the data model may be very well suited to many
> other classes of queries that are useful to end users.
> Also, this sentence left me wondering how non-Oracle databases store and
> retrieve data:
> "So, if (AB) costs N logical I/Os, it is very unlikely to contain more than
> N rows."
> I assume this is not referring to Oracle databases since it is very easy to
> write meaningful queries in Oracle that return many more rows than the
> number of LIOs needed to retrieve them. Multiple rows per block and a good
> clustering factor make this a snap.

I was referring to Oracle, but I oversimplified, looking at the worst case, where clustering doesn't help. As near as I can tell, clustering didn't do a thing for reducing logical I/O in earlier Oracle versions, though it was very useful for reducing physical I/O. I have noticed that Oracle seems to take advantage of clustering for logical I/O reduction, though I don't know how much CPU (which is what *really* matters) that really saves - whether or not clustering reduces logical I/O, the database must surely do *some* per-row work to do a table-access by rowid, and your runtime ratios were consistent with your logical I/O ratios
> And while the next statement is true, I didn't see how I could apply it
> except in unusual cases:
> "However, if it contains N rows, and if all the primary keys are indexed,
> then there has to be a nested-loops path from (AB) to C and D
> *individually* that does *at most* 2N index unique scans and 2N table
> accesses by ROWID, ..."
> This is only true if (1) all of the tables follow a master-detail chain and
> (2) I'm joining from the detail rows first (both of which you've stated as
> assumptions). However, this bounding formula only applies to the class of
> queries where it is more efficient to filter the detail rows first and then
> join to the parents. It appears to ignore the other very large class of
> useful queries where it is more efficient to start with the master rows
> first and work our way down.

Remember, I set the problem up at the beginning by assuming that the join tree was "well-formed", my own quote:

> In a normal join tree, one of these 4 tables will be the "detail" table of
> the whole query, and joins will reach the other 3 tables by three
> detail-to-master joins that begin from that table.

I further assumed without loss of generality that said "detail" table was either A or B. (The argument just substitutes "C" for A and "D" for "B" is you lose this assumption.) With these assumption, yes, you can reach the rest of the tables by detail-to-master joins. Now, you're absolutely right that a better path might be to start from C or D, of one of those has a better filter than either A or B (and I made no assumption, needing none, about whether (AB) started from the master or from the detail). However, remember that I only said that a nested-loops path from (AB) needed at most that many logical I/Os (far less than your query) to reach C and D, not that this was the best path - I'm looking at upper limits, here, and the upper limit looked *way* better than the one-at-a-time path you got. This is explained by your revelation that my assumption of this being a normal join tree was wrong.

> Finally, this statement seems to imply that one-at-a-time paths can always
> provide reasonable approximations to optimal join paths in terms of LIOs
> (maybe I'm reading too much into it):
> "So, I'm fascinated that you appear to have found a case where the sum of
> the costs of (AB) and (CD) was 85 logical I/Os (implying either very small
> tables or excellent indexed paths to less than 85 rows for both joins), but
> did not find a one-at-a-time path that cost less than (or even close to!)
> 935 logical I/Os (11*85)."
> It seems to me that the one-at-a-time approach has some significant
> drawbacks for certain classes of queries if the best it can achieve is an
> order of magnitude increase in LIOs over other join paths. The optimal join
> order in this case just leaps right out at you if you look at the schema and
> the data. It would be very nice if the CBO could see it too and I didn't
> have to resort to trickery such as adding a pseudo-column like ROWNUM in the
> inline views to force it out of the one-at-a-time path. Not only that,
> one-at-a-time paths are serial. Individual row source scans and joins can
> be performed in parallel, but the overall structure of the join order is
> serial whereas nonlinear join orders lend themselves to parallel execution
> by their very nature. This would seem to be a good reason to invest in
> improving the CBO to consider nonlinear join orders in parallel processing
> environments.

One-at-a-time join paths do not have to use serial plans, either - nothing inherently restricts the work from the driving table from being divided up. I *very* rarely find a query that needs a parallel plan to perform just fine, however, once it is well tuned. More often, I find parallelism hiding just how inefficient a non-optimal plan is, by throwing resources at it.

Yes, I am saying that one-at-a-time paths tend to perform roughly as well or better than more exotic paths, *in my own experience*, and the argument is based on my assumption of a "normal" join tree. The best counterexamples I've found have been queries that were wrong from a logical point of view, doing Cartesian products or many-to-many joins that the application was better off without. You may have found examples of functionally useful many-to-many joins (where the "many" on both sides really is "many" not "nearly one"), I grant - I just haven't seen them in my own experience, save maybe once in a blue moon.

> Regards,
> John Smiley

Received on Sun Oct 03 2004 - 19:18:22 CDT

Original text of this message