Re: Re: Density statistic calculation in case of histogram

From: Neeraj Bhatia <neeraj.dba_at_gmail.com>
Date: Mon, 6 Apr 2009 20:50:52 +0530
Message-ID: <a8fd4d730904060820x274b3861mc206d03e4637b7d0_at_mail.gmail.com>



Basically I am interested to know how density is calculated in case of height-based histograms. Also, how CBO calculate selectivity.

Thnx,
Neeraj

On Mon, Apr 6, 2009 at 8:11 PM, Martin Klier <Martin.Klier_at_klug-is.de>wrote:

> Nice to know!
>
> Maybe these papers are of interest for you as well, as much as they have
> been for me (they have been in this list for at least one time):
>
>
> http://method-r.com/downloads/doc_download/11-managing-statistics-for-optimal-query-performance-karen-morton
>
>
> http://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-be-careful/
>
> HTH
> --
> Mit freundlichem Gruß
>
>
> Martin Klier
> Senior Oracle Database Administrator
>
> ------------------------------------------------------------------------------
>
> Klug GmbH integrierte Systeme
> Lindenweg 13, D-92552 Teunz
> Tel.: +49 9671/9216-245
> Fax.: +49 9671/9216-112
> mailto: martin.klier_at_klug-is.de
> www.klug-is.de
>
> ------------------------------------------------------------------------------
>
> Geschäftsführer: Johann Klug, Roman Sorgenfrei
> Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608,
> HRB Nr. 2037, Amtsgericht Amberg
>
> Neeraj Bhatia <neeraj.dba_at_gmail.com> schrieb am 06.04.2009 15:56:10:
>
> > Von:
> >
> > Neeraj Bhatia <neeraj.dba_at_gmail.com>
> >
> > An:
> >
> > Martin Klier <Martin.Klier_at_klug-is.de>
> >
> > Kopie:
> >
> > oracle-l_at_freelists.org, oracle-l-bounce_at_freelists.org
> >
> > Datum:
> >
> > 06.04.2009 15:57
> >
> > Betreff:
> >
> > Re: Density statistic calculation in case of histogram
> >
> > Hi Martin,
> >
> > Version: 10.2.0.1.0
> >
> > After reading an excellent paper "Histograms - Myths and Facts" by
> > Wolfgang Breitling, some points get cleared.
> >
> > 1) In case of width-based histograms, density= 1/ (2*number of
> > distinct values)
> >
> > Thus come the figure 0.00005
> >
> > 2) Without histograms, density should be equal to 1/number of
> > distinct values. But in my case, it is null. I am not using
> > method_opt while collecting statistics. The global default setting
> > is FOR ALL INDEXED COLUMNS SIZE 1.
> >
> > 3) In case of height-based histograms, I didn't get these calculations.
> >
> > density =
> > Σ cnt2 / ( num_rows˜ * Σ cnt )
> > “the sum of the squared frequencies of all non-popular values
> > divided by the sum of the frequencies of all non-popular values
> > times the count of rows with not null values of the histogram column”.
> >
> > Any comments will be appreciated.
> >
> > Neeraj.
> > On Mon, Apr 6, 2009 at 5:34 PM, Martin Klier <Martin.Klier_at_klug-is.de>
> wrote:
> > Hi Neeraj,
> >
> > no straight answer to your question, but on which DB version are you
> > testing?
> >
> > Keep the estimate_percent default values in mind. Maybe if you force
> > dbms_stats to compute stats, the result will differ.
> >
> > --
> > Mit freundlichem Gruß
> >
> >
> > Martin Klier
> > Senior Oracle Database Administrator
> >
>
> ------------------------------------------------------------------------------
>
> >
> > Klug GmbH integrierte Systeme
> > Lindenweg 13, D-92552 Teunz
> > Tel.: +49 9671/9216-245
> > Fax.: +49 9671/9216-112
> > mailto: martin.klier_at_klug-is.de
> > www.klug-is.de
> >
>
> ------------------------------------------------------------------------------
>
> >
> > Geschäftsführer: Johann Klug, Roman Sorgenfrei
> > Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608,
> > HRB Nr. 2037, Amtsgericht Amberg
> >
> > oracle-l-bounce_at_freelists.org schrieb am 06.04.2009 08:47:30:
> >
> > > Von:
> > >
> > > Neeraj Bhatia <neeraj.dba_at_gmail.com>
> > >
> > > An:
> > >
> > > oracle-l_at_freelists.org
> > >
> > > Datum:
> > >
> > > 06.04.2009 08:48
> > >
> > > Betreff:
> > >
> > > Density statistic calculation in case of histogram
> > >
> > > Gesendet von:
> > >
> > > oracle-l-bounce_at_freelists.org
> > >
> > > Hi,
> > > Please tell me how density is calculated in case of histograms.
> > > I've a table with very skewed data:
> > > scott_at_ORADB10G> select col_skew, count(*) from tab_skew group by
> > > col_skew order by 1;
> > > COL_SKEW COUNT(*)
> > > ---------- ----------
> > > 1 10
> > > 2 10
> > > 3 10
> > > 4 10
> > > 5 10
> > > 6 10
> > > 7 10
> > > 8 10
> > > 9 10
> > > 10 9910
> > > 10 rows selected.
> > > scott_at_ORADB10G> exec dbms_stats.delete_table_stats(user,'TAB_SKEW');
> > > PL/SQL procedure successfully completed.
> > > -- Creating width-based histograms, #buckets = #distinct values
> > > scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > > method_opt=>'FOR COLUMNS COL_SKEW size 10');
> > > PL/SQL procedure successfully completed.
> > > scott_at_ORADB10G> select * from dba_tab_col_statistics where
> > > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > > OWNER TABLE_NA COLUMN_NAME NUM_DISTINCT LOW_VA HIGH_V
> > > DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > > AVG_COL_LEN HISTOGRAM
> > > ------ -------- -------------------- ------------ ------ ------
> > > ---------- ---------- ----------- --------- ----------- --- ---
> > > ----------- ---------------
> > > SCOTT TAB_SKEW COL_SKEW 10 C102 C10B
> > > .00005 0 10 06-APR-09 10000 YES
> > > NO 3 FREQUENCY
> > > Question: How density is calculated here? Also it's equal to
> > > Selectivity of non-popular values.
> > > scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > > method_opt=>'FOR COLUMNS COL_SKEW size 5');
> > > PL/SQL procedure successfully completed.
> > > -- Creating Height-based histograms, #buckets < #distinct values
> > > scott_at_ORADB10G> select * from dba_tab_col_statistics where
> > > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > > OWNER TABLE_NA COLUMN_NAME NUM_DISTINCT LOW_VA HIGH_V
> > > DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > > AVG_COL_LEN HISTOGRAM
> > > ------ -------- -------------------- ------------ ------ ------
> > > ---------- ---------- ----------- --------- ----------- --- ---
> > > ----------- ---------------
> > > SCOTT TAB_SKEW COL_SKEW 10 C102 C10B
> > > .98209 0 5 06-APR-09 10000 YES
> > > NO 3 HEIGHT BALANCED
> > > Question: How density is calculated here and how selectivity will be
> > > estimated for popolar as well as non-popular values?
> > >
> > > Any points/advice will be appreciated.
> > > Neeraj.
> >
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 06 2009 - 10:20:52 CDT

Original text of this message