Re: Density statistic calculation in case of histogram

From: Neeraj Bhatia <neeraj.dba_at_gmail.com>
Date: Mon, 6 Apr 2009 19:26:10 +0530
Message-ID: <a8fd4d730904060656l7f23d163g48d2cb37db17a09d_at_mail.gmail.com>



Hi Martin,

Version: 10.2.0.1.0

After reading an excellent paper "Histograms - Myths and Facts" by Wolfgang Breitling, some points get cleared.

  1. In case of width-based histograms, density= 1/ (2*number of distinct values)

Thus come the figure 0.00005

2) Without histograms, density should be equal to 1/number of distinct values. But in my case, it is null. I am not using method_opt while collecting statistics. The global default setting is FOR ALL INDEXED COLUMNS SIZE 1. 3) In case of height-based histograms, I didn't get these calculations.

density = Σ cnt2 / ( num_rows˜ * Σ cnt )

“the sum of the squared frequencies of all non-popular values divided by the sum of the frequencies of all non-popular values times the count of rows with not null values of the histogram column”.

Any comments will be appreciated.

Neeraj.
On Mon, Apr 6, 2009 at 5:34 PM, Martin Klier <Martin.Klier_at_klug-is.de>wrote:

> Hi Neeraj,
>
> no straight answer to your question, but on which DB version are you
> testing?
>
> Keep the estimate_percent default values in mind. Maybe if you force
> dbms_stats to compute stats, the result will differ.
>
> --
> Mit freundlichem Gruß
>
>
> Martin Klier
> Senior Oracle Database Administrator
>
> ------------------------------------------------------------------------------
>
> Klug GmbH integrierte Systeme
> Lindenweg 13, D-92552 Teunz
> Tel.: +49 9671/9216-245
> Fax.: +49 9671/9216-112
> mailto: martin.klier_at_klug-is.de
> www.klug-is.de
>
> ------------------------------------------------------------------------------
>
> Geschäftsführer: Johann Klug, Roman Sorgenfrei
> Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608,
> HRB Nr. 2037, Amtsgericht Amberg
>
> oracle-l-bounce_at_freelists.org schrieb am 06.04.2009 08:47:30:
>
> > Von:
> >
> > Neeraj Bhatia <neeraj.dba_at_gmail.com>
> >
> > An:
> >
> > oracle-l_at_freelists.org
> >
> > Datum:
> >
> > 06.04.2009 08:48
> >
> > Betreff:
> >
> > Density statistic calculation in case of histogram
> >
> > Gesendet von:
> >
> > oracle-l-bounce_at_freelists.org
> >
> > Hi,
> > Please tell me how density is calculated in case of histograms.
> > I've a table with very skewed data:
> > scott_at_ORADB10G> select col_skew, count(*) from tab_skew group by
> > col_skew order by 1;
> > COL_SKEW COUNT(*)
> > ---------- ----------
> > 1 10
> > 2 10
> > 3 10
> > 4 10
> > 5 10
> > 6 10
> > 7 10
> > 8 10
> > 9 10
> > 10 9910
> > 10 rows selected.
> > scott_at_ORADB10G> exec dbms_stats.delete_table_stats(user,'TAB_SKEW');
> > PL/SQL procedure successfully completed.
> > -- Creating width-based histograms, #buckets = #distinct values
> > scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > method_opt=>'FOR COLUMNS COL_SKEW size 10');
> > PL/SQL procedure successfully completed.
> > scott_at_ORADB10G> select * from dba_tab_col_statistics where
> > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > OWNER TABLE_NA COLUMN_NAME NUM_DISTINCT LOW_VA HIGH_V
> > DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > AVG_COL_LEN HISTOGRAM
> > ------ -------- -------------------- ------------ ------ ------
> > ---------- ---------- ----------- --------- ----------- --- ---
> > ----------- ---------------
> > SCOTT TAB_SKEW COL_SKEW 10 C102 C10B
> > .00005 0 10 06-APR-09 10000 YES
> > NO 3 FREQUENCY
> > Question: How density is calculated here? Also it's equal to
> > Selectivity of non-popular values.
> > scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > method_opt=>'FOR COLUMNS COL_SKEW size 5');
> > PL/SQL procedure successfully completed.
> > -- Creating Height-based histograms, #buckets < #distinct values
> > scott_at_ORADB10G> select * from dba_tab_col_statistics where
> > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > OWNER TABLE_NA COLUMN_NAME NUM_DISTINCT LOW_VA HIGH_V
> > DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > AVG_COL_LEN HISTOGRAM
> > ------ -------- -------------------- ------------ ------ ------
> > ---------- ---------- ----------- --------- ----------- --- ---
> > ----------- ---------------
> > SCOTT TAB_SKEW COL_SKEW 10 C102 C10B
> > .98209 0 5 06-APR-09 10000 YES
> > NO 3 HEIGHT BALANCED
> > Question: How density is calculated here and how selectivity will be
> > estimated for popolar as well as non-popular values?
> >
> > Any points/advice will be appreciated.
> > Neeraj.
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 06 2009 - 08:56:10 CDT

Original text of this message