Re: A question about huge difference in cardinality of a query with 3 predicates

From: Nirav A Shah <shivam71_at_hotmail.com>
Date: Sun, 29 Oct 2017 16:52:16 +0000
Message-ID: <SYXPR01MB095953D390B02A4F05324A27C1580_at_SYXPR01MB0959.ausprd01.prod.outlook.com>



Thanks a lot Jonathan and Andrew!! I am now getting convinced that it is histogram that is causing the issue but would love to see how in a test setup the issue could get resolved. Meaning , how can I differently create a histogram in a test setup that gets the cardinality in a nearly ballpark range...

I will go over the suggestions you have mentioned and post back in next two days time.

Thanks again,

Nirav



From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk> Sent: Sunday, October 29, 2017 8:25 PM
To: oracle-l_at_freelists.org; shivam71_at_hotmail.com Subject: Re: A question about huge difference in cardinality of a query with 3 predicates

Nirav,

Thanks for posting so much relevant information. There are several possible effects visible here. The first is simply that to the optimizer the selectivity of a

combination of columns is the product of the individual selectivities. This is why you get an estimate of 142 when you drop the histograms:

        143 = 2231714 * (1/3 * 1/50 * 1/104)

Secondly (as Andrew says), if one of your predicates goes out of range, Oracle scales down its selectivity by a measure of how far out of range it is.

Third, when a column has a frequency histogram on it and you ask for a value that doesn't seem to be in the histogram then Oracle uses "half the least popular" value as the selectivity (and then scales that if you're also out of range). You can see that two of your frequency histograms have "num_buckets" less than "num_distinct", so the gather for the histogram must have missed some values (or num_buckets would equal num_distinct). Notice that the sample for the histograms is only 5,548 rows, so with a heavy skew (which I think you must have to get 10,000 rows as the final result) it's not surprising that you've missed a few values somewhere. You may do better if you gather histograms on these columns with a sample size of 100% - but you'd still run into the "combination of columns" problem.

So - you need to create a column group (dbms_stats.create_extended_stats) across all three columns or (possibly) on the two columns that are most closely correlated; and you'll need a histogram on the extended stats or the individual histograms will make the optimizer ignore the benefit of the column group. Even then, unfortunately, you may run into problems because you really need the column group histogram to be a frequency histogram and you may have too many combinations across the three columns to get one (hence my comment about picking two out of three).

Here's a search link to a few notes that may be useful: https://jonathanlewis.wordpress.com/?s=column+group+histogram. Oracle Scratchpad on WordPress.com<https://jonathanlewis.wordpress.com/?s=column+group+histogram> jonathanlewis.wordpress.com
Just another Oracle weblog

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Oct 29 2017 - 17:52:16 CET

Original text of this message