# Re: Density statistic calculation in case of histogram

Date: Wed, 8 Apr 2009 19:16:07 -0700 (PDT)

Message-ID: <481989.36570.qm_at_web80604.mail.mud.yahoo.com>

You're welcome. I found a 10.2.0.2 database and tried your test case. The problem of density calculation in 10.2.0.1 disappears in 10.2.0.2.

I spent many hours reading Wolfgang's CBO articles in 2003 and 2004. You don't need to ask, because it's not possible for me to miss any important articles or books on Oracle.

To add to the list of formulas of density calculation, there's one more rarely mentioned. Tom Kyte showed a "limited"-published Metalink document at http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2969235095639 That may be applicable only to certain old versions.

Yong Huang

- On Wed, 4/8/09, Neeraj Bhatia <neeraj.dba_at_gmail.com> wrote:

From: Neeraj Bhatia <neeraj.dba_at_gmail.com>
Subject: Re: Density statistic calculation in case of histogram
To: yong321_at_yahoo.com

Cc: oracle-l_at_freelists.org

Date: Wednesday, April 8, 2009, 8:51 AM

Thanks a lot Yong for very straight forward answer and reference. It helped me and now concept it clear to me. I'll also install newer patch set and check densities on that.

BTW, have you get a chance to look into Wolfgang's presentation, where he discussed the calculation of density for various buckets?

Again thanks a lot!

Neeraj Bhatia

On Wed, Apr 8, 2009 at 1:15 AM, Yong Huang <yong321_at_yahoo.com> wrote:

*>*

*> Neeraj,*

*>*

*> The calculation of density in case of a histogram is*

documented in the US

*> patent 6732085:*

*> http://www.freepatentsonline.com/6732085.html*

*> Jonathan Lewis's CBO book p.172 says the same:*

"sum of the square of the

*> frequency of the nonpopular values / (number of*

nonnull rows * number of

*> nonpopular nonull rows)", where*

"frequency" is called repetition count in

*> the patent.*

*>*

*> Intuitive understanding is beyond me. But it's not*

hard to apply the

*> proposed formula. In your case of 5 bucket*

height-balanced histogram, it is:

*>*

*> bucket# 1 2 3 4 5*

*> col_skew 1-10 10 10 10 10*

*>*

*> 10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2*

*> -------------------------------------------- = 0.001*

*> 10000 * 90*

*>*

*> So the correct answer is 0.001. If you run your test*

in 10.2.0.4, you'll

*> get this number. In 10.2.0.1, you get 0.98209 for some*

reason. You would

*> "accidentally" get that number if you*

applied the formula to your 10-bucket

*> frequency (you call width-based) histogram:*

*>*

*> bucket# 1 2 3 4 5 6 7 8 9 *

10

*> col_skew 1 2 3 4 5 6 7 8 9 *

10

*>*

*> 10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+9910^2*

*> --------------------------------------------------- =*

.98209

*> 10000 * 10000*

*>*

*> Yong Huang*

*>*

*>*

*> Neeraj Bhatia wrote:*

*>*

*> Version: 10.2.0.1.0*

*>*

*> After reading an excellent paper "Histograms -*

Myths and Facts" by Wolfgang

*> Breitling, some points get cleared.*

*>*

*> 1) In case of width-based histograms, density= 1/*

(2*number of distinct

*> values)*

*>*

*> Thus come the figure 0.00005*

*>*

*> 2) Without histograms, density should be equal to*

1/number of distinct

*> values. But in my case, it is null. I am not using*

method_opt while

*> collecting statistics. The global default setting is*

**FOR ALL INDEXED
**

*> COLUMNS*

*> SIZE 1.*

*>*

*> 3) In case of height-based histograms, I didn't*

get these calculations.

*>*

*> density = Î£ cnt2 / ( num_rowsËœ * Î£ cnt )*

*>*

*> â€œthe sum of the squared frequencies of all*

non-popular values divided by

*> the*

*> sum of the frequencies of all non-popular values times*

the count of rows

*> with not null values of the histogram columnâ€?.*

-- http://www.freelists.org/webpage/oracle-lReceived on Wed Apr 08 2009 - 21:16:07 CDT