Re: Optimizer not using histogram to calculate estimated number of rows?

From: Alberto Dell'Era <alberto.dellera_at_gmail.com>
Date: Thu, 11 Jun 2009 01:17:17 +0200
Message-ID: <4ef2fbf50906101617u144a1f2fvf96f220cac54386c_at_mail.gmail.com>



Assuming SIZE=254 and estimate_percent = null

ndv=52999 > size=254 => height-balanced histogram

Only if 'HVA_FUNC' is recorded as a popular value the histogram data will be used; if either 'HVA_FUNC' is not popular or it is not recorded in the histogram, num_rows*density is used.

To build the histogram, rows are sorted and then sampled over an grid spaced approx num_rows / size = 166067 / 254 = 653.

So a value most occur at least 2*653 times to be recorded as popular (= sampled at least two times), it might be recorded as popular if it occurs at least 653 times (depending on the exact sampling grid), it is never recorded as popular if it occurs less than 653 times.

In the other case 26000 / 254 = 102, hence it is more likely that some values get recorded as popular (since you mention "60 values have over 100 rows each",
some of them might be recorded as popular).

hth
Alberto

On Thu, Jun 11, 2009 at 00:20, David Kurtz<info_at_go-faster.co.uk> wrote:
> I am working on 10.2.0.4 on Linux.  I have a table with a highly skewed set
> of data.
>
> 166067 rows, 52999 distinct values in an indexed column.
> For 3000+ values have 3 rows each, about 60 values have over 100 rows each.
> So I thought this is a good candidate for a histogram.
(snip)

>
> However, the optimizer doesn't seem to use the histogram in calculating the
> cost
> (That is causing problems in a more complex query, because the is causing
> the plan not to change).
>
> select /*+GATHER_PLAN_STATISTICS*/ count(*) from psroleuser where roleuser
> ='HVA_FUNC'
>
> ---------------------------------------------------------------------------------------------
> | Id  | Operation         | Name          | Starts | E-Rows | A-Rows |
> A-Time   | Buffers |
> ---------------------------------------------------------------------------------------------
> |   1 |  SORT AGGREGATE   |               |      1 |      1 |      1
> |00:00:00.01 |       4 |
> |*  2 |   INDEX RANGE SCAN| PS_PSROLEUSER |      1 |      3 |    169
> |00:00:00.01 |       4 |
> ---------------------------------------------------------------------------------------------
>
> I have another version of this table on a test database, but the volume is
> slightly smaller (26000 rows, 3300 distinct values)
>
> That uses the histogram exactly as I would expect, the estimated number of
> rows is close to reality, and the execution plan of the complex query
> changes.

-- 
Alberto Dell'Era
"the more you know, the faster you go"
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jun 10 2009 - 18:17:17 CDT

Original text of this message