Re: Density statistic calculation in case of histogram

From: Yong Huang <yong321_at_yahoo.com>
Date: Tue, 7 Apr 2009 12:45:21 -0700 (PDT)
Message-ID: <900201.14286.qm_at_web80604.mail.mud.yahoo.com>


Neeraj,

The calculation of density in case of a histogram is documented in the US patent 6732085: http://www.freepatentsonline.com/6732085.html Jonathan Lewis's CBO book p.172 says the same: "sum of the square of the frequency of the nonpopular values / (number of nonnull rows * number of nonpopular nonull rows)", where "frequency" is called repetition count in the patent.

Intuitive understanding is beyond me. But it's not hard to apply the proposed formula. In your case of 5 bucket height-balanced histogram, it is:

bucket# 1 2 3 4 5
col_skew 1-10 10 10 10 10

10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2
-------------------------------------------- = 0.001
10000 * 90

So the correct answer is 0.001. If you run your test in 10.2.0.4, you'll get this number. In 10.2.0.1, you get 0.98209 for some reason. You would "accidentally" get that number if you applied the formula to your 10-bucket frequency (you call width-based) histogram:

bucket# 1 2 3 4 5 6 7 8 9 10 col_skew 1 2 3 4 5 6 7 8 9 10

10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+10^2+9910^2
--------------------------------------------------- = .98209
10000 * 10000

Yong Huang

Neeraj Bhatia wrote:

Version: 10.2.0.1.0

After reading an excellent paper "Histograms - Myths and Facts" by Wolfgang Breitling, some points get cleared.

  1. In case of width-based histograms, density= 1/ (2*number of distinct values)

Thus come the figure 0.00005

2) Without histograms, density should be equal to 1/number of distinct values. But in my case, it is null. I am not using method_opt while collecting statistics. The global default setting is FOR ALL INDEXED COLUMNS SIZE 1. 3) In case of height-based histograms, I didn't get these calculations.

density = Σ cnt2 / ( num_rows˜ * Σ cnt )

“the sum of the squared frequencies of all non-popular values divided by the sum of the frequencies of all non-popular values times the count of rows with not null values of the histogram columnâ€�.       

--

http://www.freelists.org/webpage/oracle-l Received on Tue Apr 07 2009 - 14:45:21 CDT

Original text of this message