Re: Density statistic calculation in case of histogram

From: Yong Huang <yong321_at_yahoo.com>
Date: Wed, 8 Apr 2009 19:16:07 -0700 (PDT)
Message-ID: <481989.36570.qm_at_web80604.mail.mud.yahoo.com>


You're welcome. I found a 10.2.0.2 database and tried your test case. The problem of density calculation in 10.2.0.1 disappears in 10.2.0.2.

I spent many hours reading Wolfgang's CBO articles in 2003 and 2004. You don't need to ask, because it's not possible for me to miss any important articles or books on Oracle.

To add to the list of formulas of density calculation, there's one more rarely mentioned. Tom Kyte showed a "limited"-published Metalink document at http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2969235095639 That may be applicable only to certain old versions.

Yong Huang

  • On Wed, 4/8/09, Neeraj Bhatia <neeraj.dba_at_gmail.com> wrote:

From: Neeraj Bhatia <neeraj.dba_at_gmail.com> Subject: Re: Density statistic calculation in case of histogram To: yong321_at_yahoo.com
Cc: oracle-l_at_freelists.org
Date: Wednesday, April 8, 2009, 8:51 AM

Thanks a lot Yong for very straight forward answer and reference. It helped me and now concept it clear to me. I'll also install newer patch set and check densities on that.

BTW, have you get a chance to look into Wolfgang's presentation, where he discussed the calculation of density for various buckets?

Again thanks a lot!
Neeraj Bhatia

On Wed, Apr 8, 2009 at 1:15 AM, Yong Huang <yong321_at_yahoo.com> wrote:

>
> Neeraj,
>
> The calculation of density in case of a histogram is
documented in the US
> patent 6732085:
> http://www.freepatentsonline.com/6732085.html
> Jonathan Lewis's CBO book p.172 says the same:
"sum of the square of the
> frequency of the nonpopular values / (number of
nonnull rows * number of
> nonpopular nonull rows)", where

"frequency" is called repetition count in
> the patent.
>
> Intuitive understanding is beyond me. But it's not
hard to apply the
> proposed formula. In your case of 5 bucket
height-balanced histogram, it is:
>
> bucket# 1 2 3 4 5
> col_skew 1-10 10 10 10 10
>
> 10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2
> -------------------------------------------- = 0.001
> 10000 * 90
>
> So the correct answer is 0.001. If you run your test
in 10.2.0.4, you'll
> get this number. In 10.2.0.1, you get 0.98209 for some
reason. You would
> "accidentally" get that number if you

applied the formula to your 10-bucket
> frequency (you call width-based) histogram:
>
> bucket# 1 2 3 4 5 6 7 8 9
10
> col_skew 1 2 3 4 5 6 7 8 9
10
>
> 10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+9910^2
> --------------------------------------------------- =
.98209
> 10000 * 10000
>
> Yong Huang
>
>
> Neeraj Bhatia wrote:
>
> Version: 10.2.0.1.0
>
> After reading an excellent paper "Histograms -
Myths and Facts" by Wolfgang
> Breitling, some points get cleared.
>
> 1) In case of width-based histograms, density= 1/
(2*number of distinct
> values)
>
> Thus come the figure 0.00005
>
> 2) Without histograms, density should be equal to
1/number of distinct
> values. But in my case, it is null. I am not using
method_opt while
> collecting statistics. The global default setting is
FOR ALL INDEXED
> COLUMNS
> SIZE 1.
>
> 3) In case of height-based histograms, I didn't
get these calculations.
>
> density = Σ cnt2 / ( num_rows˜ * Σ cnt )
>
> “the sum of the squared frequencies of all
non-popular values divided by
> the
> sum of the frequencies of all non-popular values times
the count of rows
> with not null values of the histogram column�.
      

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Apr 08 2009 - 21:16:07 CDT

Original text of this message