# Re: Re: Density statistic calculation in case of histogram

Date: Thu, 9 Apr 2009 10:21:39 +0200

Message-ID: <4ef2fbf50904090121o26e9e41aj983b52c7c041d751_at_mail.gmail.com>

On Thu, Apr 9, 2009 at 03:19, Neeraj Bhatia <neeraj.dba_at_gmail.com> wrote:

*> Thanks Alberto for the update. Actually I am investigating the calculations*

*> in R10 and my plan was to later cover R11 as well. Since still most of the*

*> systems are on R10 and in future also (atleast 2-3 years), I am very*

*> interested.*

Ok, please find below my interpretation about the meaning of density, retrieved from an old offline discussion.

First of all, for a filter predicate "where column=constant", density is not used for popular values. So, consider only the case of constant being not popular, and imagine removing the popular values from the table (which is what dbms_stats does).

Of course the estimated cardinality of a filter predicate "where
column=constant" "has to be" the expected value of the cardinality,
that is, the average value of the number of rows retrieved over all possible
values of constant.

If we knew the probability mass function (PMF) of "constant" (i.e. we could
characterize the workload perfectly), the expected value would be

sum( count(constant) * PMF (constant) ), over all values of constant

dbms_stats knows count(constant) perfectly, knows nothing about the PMF.

If we assume PMF(constant) = count(constant) / num_rows, that is, the more frequently constant is represented in the table, the higher the probability of being requested by the client is => we get the square-based "OldDensity" formula.

If we assume PMF(constant) = 1 / num_distinct, that is, we assume
that each *distinct value* has the same probability of being requested
by the client

regardless of how frequently it is represented in the table
=> we get the "NewDensity" formula (which is also the most intuitive).

Remember that the above figures must be adjusted by num_rows, since the estimated cardinality is density * num_rows, where num_rows is the total number of rows of the original table.

hth

Alberto

-- Alberto Dell'Era "the more you know, the faster you go" -- http://www.freelists.org/webpage/oracle-lReceived on Thu Apr 09 2009 - 03:21:39 CDT