# Re: Density statistic calculation in case of histogram

Date: Wed, 8 Apr 2009 19:21:03 +0530

Message-ID: <a8fd4d730904080651g4b9befe1j79513f6761782a0a_at_mail.gmail.com>

Thanks a lot Yong for very straight forward answer and reference. It helped me and now concept it clear to me.

I'll also install newer patch set and check densities on that.

BTW, have you get a chance to look into Wolfgang's presentation, where he discussed the calculation of density for various buckets?

Again thanks a lot!

Neeraj Bhatia

On Wed, Apr 8, 2009 at 1:15 AM, Yong Huang <yong321_at_yahoo.com> wrote:

*>*

*> Neeraj,*

*>*

*> The calculation of density in case of a histogram is documented in the US*

*> patent 6732085:*

*> http://www.freepatentsonline.com/6732085.html*

*> Jonathan Lewis's CBO book p.172 says the same: "sum of the square of the*

*> frequency of the nonpopular values / (number of nonnull rows * number of*

*> nonpopular nonull rows)", where "frequency" is called repetition count in*

*> the patent.*

*>*

*> Intuitive understanding is beyond me. But it's not hard to apply the*

*> proposed formula. In your case of 5 bucket height-balanced histogram, it is:*

*>*

*> bucket# 1 2 3 4 5*

*> col_skew 1-10 10 10 10 10*

*>*

*> 10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2*

*> -------------------------------------------- = 0.001*

*> 10000 * 90*

*>*

*> So the correct answer is 0.001. If you run your test in 10.2.0.4, you'll*

*> get this number. In 10.2.0.1, you get 0.98209 for some reason. You would*

*> "accidentally" get that number if you applied the formula to your 10-bucket*

*> frequency (you call width-based) histogram:*

*>*

*> bucket# 1 2 3 4 5 6 7 8 9 10*

*> col_skew 1 2 3 4 5 6 7 8 9 10*

*>*

*> 10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+9910^2*

*> --------------------------------------------------- = .98209*

*> 10000 * 10000*

*>*

*> Yong Huang*

*>*

*>*

*> Neeraj Bhatia wrote:*

*>*

*> Version: 10.2.0.1.0*

*>*

*> After reading an excellent paper "Histograms - Myths and Facts" by Wolfgang*

*> Breitling, some points get cleared.*

*>*

*> 1) In case of width-based histograms, density= 1/ (2*number of distinct*

*> values)*

*>*

*> Thus come the figure 0.00005*

*>*

*> 2) Without histograms, density should be equal to 1/number of distinct*

*> values. But in my case, it is null. I am not using method_opt while*

*> collecting statistics. The global default setting is FOR ALL INDEXED*

*> COLUMNS*

*> SIZE 1.*

*>*

*> 3) In case of height-based histograms, I didn't get these calculations.*

*>*

*> density = Î£ cnt2 / ( num_rowsËœ * Î£ cnt )*

*>*

*> â€œthe sum of the squared frequencies of all non-popular values divided by*

*> the*

*> sum of the frequencies of all non-popular values times the count of rows*

*> with not null values of the histogram columnâ€�.*

*>*

*>*

*>*

*>*

*>*

-- http://www.freelists.org/webpage/oracle-lReceived on Wed Apr 08 2009 - 08:51:03 CDT