From oracle-l-bounce@freelists.org Sun Oct 3 19:18:22 2004 Return-Path: Received: from air189.startdedicated.com (root@localhost) by orafaq.com (8.11.6/8.11.6) with ESMTP id i940IMJ20660 for ; Sun, 3 Oct 2004 19:18:22 -0500 X-ClientAddr: 206.53.239.180 Received: from turing.freelists.org (freelists-180.iquest.net [206.53.239.180]) by air189.startdedicated.com (8.11.6/8.11.6) with ESMTP id i940ILI20655 for ; Sun, 3 Oct 2004 19:18:21 -0500 Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id F019B72C266; Sun, 3 Oct 2004 19:24:25 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 30182-21; Sun, 3 Oct 2004 19:24:25 -0500 (EST) Received: from turing (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 552BB72C19E; Sun, 3 Oct 2004 19:24:25 -0500 (EST) Message-ID: <1096849367.416097d735b76@www.singingsql.com> Date: Sun, 3 Oct 2004 19:22:47 -0500 From: Dan Tow To: Smiley John - IL Cc: "'oracle-l@freelists.org'" Subject: Re: Join order and intermediate results References: In-Reply-To: MIME-Version: 1.0 Content-type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.2 X-Originating-IP: 63.202.177.252 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns1.hostingcpanel.com X-AntiAbuse: Original Domain - freelists.org X-AntiAbuse: Originator/Caller UID/GID - [32001 502] / [47 12] X-AntiAbuse: Sender Address Domain - singingsql.com X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 10620 X-ecartis-version: Ecartis v1.0.0 Sender: oracle-l-bounce@freelists.org Errors-To: oracle-l-bounce@freelists.org X-original-sender: dantow@singingsql.com Precedence: normal Reply-To: dantow@singingsql.com X-list: oracle-l X-Virus-Scanned: by amavisd-new at freelists.org John, You're most welcome. I have some responses below, in-line. Yours, Dan Tow 650-858-1557 www.singingsql.com Quoting Smiley John - IL : > Dan, > > Thanks for taking the time to provide such a detailed explanation. I'm > probably missing something, but it appears that this explanation of how > one-at-a-time (linear) join orders can match (or nearly so) the performance > of other join orders is based on the assumption that the tables have a > master-detail relationship of some kind. Queries in a data warehouse rarely > have this property. It has not been my experience, for even the largest queries I've dealt with, that the master-detail rule somehow fails to hold for data-warehouse queries. Admittedly, most (but not all) of my experience with data-warehouse-type queries ahs been with very large, batch queries running against mixed-use databases, rather than pure data-warehouse applications, so my set of dataopints surely differs from yours. I think I have pretty good theoretical reason to believe that a well-designed data warehouse application would tend to avoid these, as well, though, as I decribe below. > In this particular case, joining A to B gave a > relatively small result set and joining C to D gave a relatively small > result set, so that joining AB to CD was very efficient. > > The other combinations were either very inefficient or had no semantic use: > * A cannot be joined to C or D (other than a meaningless Cartesian > product) because they have no columns in common > * D cannot be joined to A or B (other than a meaningless Cartesian > product) because they have no columns in common > * B joined to C without filtering B through A and C through D gave an > enormous intermediate result set > OK, now to the theoretical argument for why we should tend to avoid these many-to-many joins, using your case as an example. I'll make a couple of guesses about the undescribed details of your case just to make this concrete, but alternate examples consistent with the information you provide should look much the same: You have a normal join from A to B, which I'll guess is in the direction (toward the primary key of) B, with a super filter on A, such that the join of A and B has at most a few hundred rows: A (Fa<<1) | V B with a join that looks like A.Fkey1=B.Pkey1. Both A and B are very large (likely A being largest, since it is the detail table), but the filter condition on A is so selective that the filter ratio on A is a very small fraction, maybe Fa=0.0001 or less, resulting in a result set for (AB) that is Fa*Ca, where Ca is the cardinality of A. (This assumes, as is usually the case, that the join itself, from A to B is "non-filtering.") Likewise, you have a normal join from C to D, which I'll assume is in that direction, which also returns at most a few hundred rows: C | V D (Fd<<1) with a join that looks like C.Fkey2=D.Pkey2. Both C and D are very large (likely C being largest, since it is the detail table), but the filter condition on D is so selective that the filter ratio on D is a very small fraction, maybe Fd=0.0001 or less, resulting in a result set for (AB) that is Fd*Cc, where Cc is the cardinality of C. (This assumes, as is usually the case, that the join itself, from C to D is "non-filtering.") Finally, you have a many-to-many join between B and C: B--C on some many-to-many match like B.Colx=C.Colx, where the join columns are not even close to unique on either end of the join: A (Fa<<1) | V B--C | V D (Fd<<1) I agree that in this scenario, you're better off doing as you are doing, a many-to-many join of (AB) and (CD), rather than following a one-at-a-time path from A to B to C to D (if Fa One could argue that the data model was not suited to this query, but that > doesn't solve the problem and the data model may be very well suited to many > other classes of queries that are useful to end users. > > Also, this sentence left me wondering how non-Oracle databases store and > retrieve data: > > "So, if (AB) costs N logical I/Os, it is very unlikely to contain more than > N rows." > I assume this is not referring to Oracle databases since it is very easy to > write meaningful queries in Oracle that return many more rows than the > number of LIOs needed to retrieve them. Multiple rows per block and a good > clustering factor make this a snap. I was referring to Oracle, but I oversimplified, looking at the worst case, where clustering doesn't help. As near as I can tell, clustering didn't do a thing for reducing logical I/O in earlier Oracle versions, though it was very useful for reducing physical I/O. I have noticed that Oracle seems to take advantage of clustering for logical I/O reduction, though I don't know how much CPU (which is what *really* matters) that really saves - whether or not clustering reduces logical I/O, the database must surely do *some* per-row work to do a table-access by rowid, and your runtime ratios were consistent with your logical I/O ratios > > And while the next statement is true, I didn't see how I could apply it > except in unusual cases: > > "However, if it contains N rows, and if all the primary keys are indexed, > then there has to be a nested-loops path from (AB) to C and D > *individually* that does *at most* 2N index unique scans and 2N table > accesses by ROWID, ..." > This is only true if (1) all of the tables follow a master-detail chain and > (2) I'm joining from the detail rows first (both of which you've stated as > assumptions). However, this bounding formula only applies to the class of > queries where it is more efficient to filter the detail rows first and then > join to the parents. It appears to ignore the other very large class of > useful queries where it is more efficient to start with the master rows > first and work our way down. Remember, I set the problem up at the beginning by assuming that the join tree was "well-formed", my own quote: > In a normal join tree, one of these 4 tables will be the "detail" table of > the whole query, and joins will reach the other 3 tables by three > detail-to-master joins that begin from that table. I further assumed without loss of generality that said "detail" table was either A or B. (The argument just substitutes "C" for A and "D" for "B" is you lose this assumption.) With these assumption, yes, you can reach the rest of the tables by detail-to-master joins. Now, you're absolutely right that a better path might be to start from C or D, of one of those has a better filter than either A or B (and I made no assumption, needing none, about whether (AB) started from the master or from the detail). However, remember that I only said that a nested-loops path from (AB) needed at most that many logical I/Os (far less than your query) to reach C and D, not that this was the best path - I'm looking at upper limits, here, and the upper limit looked *way* better than the one-at-a-time path you got. This is explained by your revelation that my assumption of this being a normal join tree was wrong. > > > Finally, this statement seems to imply that one-at-a-time paths can always > provide reasonable approximations to optimal join paths in terms of LIOs > (maybe I'm reading too much into it): > > "So, I'm fascinated that you appear to have found a case where the sum of > the costs of (AB) and (CD) was 85 logical I/Os (implying either very small > tables or excellent indexed paths to less than 85 rows for both joins), but > did not find a one-at-a-time path that cost less than (or even close to!) > 935 logical I/Os (11*85)." > It seems to me that the one-at-a-time approach has some significant > drawbacks for certain classes of queries if the best it can achieve is an > order of magnitude increase in LIOs over other join paths. The optimal join > order in this case just leaps right out at you if you look at the schema and > the data. It would be very nice if the CBO could see it too and I didn't > have to resort to trickery such as adding a pseudo-column like ROWNUM in the > inline views to force it out of the one-at-a-time path. Not only that, > one-at-a-time paths are serial. Individual row source scans and joins can > be performed in parallel, but the overall structure of the join order is > serial whereas nonlinear join orders lend themselves to parallel execution > by their very nature. This would seem to be a good reason to invest in > improving the CBO to consider nonlinear join orders in parallel processing > environments. > One-at-a-time join paths do not have to use serial plans, either - nothing inherently restricts the work from the driving table from being divided up. I *very* rarely find a query that needs a parallel plan to perform just fine, however, once it is well tuned. More often, I find parallelism hiding just how inefficient a non-optimal plan is, by throwing resources at it. Yes, I am saying that one-at-a-time paths tend to perform roughly as well or better than more exotic paths, *in my own experience*, and the argument is based on my assumption of a "normal" join tree. The best counterexamples I've found have been queries that were wrong from a logical point of view, doing Cartesian products or many-to-many joins that the application was better off without. You may have found examples of functionally useful many-to-many joins (where the "many" on both sides really is "many" not "nearly one"), I grant - I just haven't seen them in my own experience, save maybe once in a blue moon. > Regards, > > John Smiley -- http://www.freelists.org/webpage/oracle-l