Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone used ROW_NUMBER() to get around using a UNION statement when UNION ALL doesn't work??")

From: Greg Rahn <greg_at_structureddata.org>
Date: Fri, 12 Aug 2011 14:15:35 -0700
Message-ID: <CAGXkmit4y=RA0zzY_o0duqgaOH2Ov-K0xOxjExmgzoAer=bxeg_at_mail.gmail.com>



So, let's start from the top...

At a high level the difference here is a case of "less bad or less unlucky" - the cardinality estimates are horrid here which makes be believe there are probably non-representative stats. This raises the question: What is the stats strategy and dbms_stats command being used here?

The main problem with #1 is that deep in the plan cardinality is grossly underestimated and that cascaded in to a series of NLJs where 56,540,199 rows show up and then have a FILTER applied to sort it down to 89,200 (so much for FILTER early). Compare that to #3 (and #2) where there is just less than 3 orders of magnitude less rows flowing through the NLJ and 89,276 sort down to 89,200 (because of a plan difference where the FILTER is applied in an un-merged view (lines 22-26) which also shows up in the UNION ALL #2 plan). This is a chronic case of CBO optimizing for a small number of rows and getting orders of magnitude more than that. When that happens all bets are off...

One can not make any generalizations about UNION vs UNION ALL vs WINDOW SORT because the input into those row sources for these plans are grossly different here (the latter 2 have the un-merged view with the filter which is why they are close in execution times). This is why I asked early on to see the plans and why it is so important to include that information - both the plan and execution stats/actual row counts/etc.

Assuming this is not 11g, I'd be interested to see what plan you would get if you gathered stats with a 100% sample size - offering the CBO the best possible chance at stats (in 11g, the default of auto_sample_size should be used).

Main takeaway here is to fix the input (stats) before attempting to fix the output (the execution plan). The CBO generally follows the cliche "garbage (stats) in, garbage (plans) out".

Hope that helps...

On Fri, Aug 12, 2011 at 1:38 PM, Jeremy Schneider <jeremy.schneider_at_ardentperf.com> wrote:
> I don't know exactly how oracle would cost a UNION versus how oracle would
> cost a UNION-ALL plus a WINDOW.  But it probably depends a bit on some
> environmental factors including row-source cardinality, work area size,
> estimated cardinality of sort columns, etc...  But it seems immaterial which
> comes out cheaper from the CBO - you just have to decide which way to code
> it then help the optimizer get the best plan with your code.  Personally I'd
> go with the union and fix the optimizing of that plan, which is the simpler
> and more readable to do the same thing.  Then, a little digging to see where
> Oracle went wrong in it's guessing.  Calling
> display_plan(null,null,'ALLSTATS LAST') right after an execution can help
> with this since it'll give you a side-by-side comparison of estimated to
> actual rowcounts.  Ideally the optimizer should be getting the same plan for
> both ways of writing the query.
>
> On Fri, Aug 12, 2011 at 3:21 PM, Taylor, Chris David
> <ChrisDavid.Taylor_at_ingrambarge.com> wrote:
>>
>> So it is interesting to me then that Oracle would pick a better plan for
>> UNION-ALL vs UNION in that case with the same information available to the
>> optimizer at the time each query is executed.
>> In that regard then, UNION would seem to be a more expensive operation
>> REGARDLESS of the additional SORT operation that must occur.
>>
>> (Theory) Following that, IF UNION-ALL is always less expensive than UNION
>> (before the additional SORT operation) then the WINDOW function applied
>> (depending on overhead) will OFTEN come out cheaper than a UNION.
>>
>> From: Jeremy Schneider [mailto:jeremy.schneider_at_ardentperf.com]
>> Sent: Friday, August 12, 2011 2:49 PM
>> To: Taylor, Chris David
>> Cc: Greg Rahn; Wolfgang Breitling (breitliw_at_centrexcc.com); mwf_at_rsiz.com;
>> oracle-l_at_freelists.org
>> Subject: Re: UNION ALL with ROW_NUMBER vs UNION (WAS: "Wow - has anyone
>> used ROW_NUMBER() to get around using a UNION statement when UNION ALL
>> doesn't work??")
>>
>> Aaaah, it's clear now.  This has nothing to do with an optimization
>> between UNION vs ROW_NUMBER -- it's that your plan completely changed when
>> you switched the syntax -- most importantly the table join order.  If you
>> use the same join order with the UNION then you should see similar
>> performance.
>>
>> It seems that there are two table join orders with nearly identical cost.
>> Something in the costing of the UNION statement causes the optimizer to pick
>> the unlucky order, whereas with the UNION ALL it picks the more lucky one.

-- 
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Fri Aug 12 2011 - 16:15:35 CDT

Original text of this message