Antwort: Re: Density statistic calculation in case of histogram

From: Martin Klier <Martin.Klier_at_klug-is.de>
Date: Mon, 6 Apr 2009 16:41:56 +0200
Message-ID: <OFCD4B65B7.C5560FAF-ONC1257590.00508E4F-C1257590.0050BE9E_at_LocalDomain>



Nice to know!

Maybe these papers are of interest for you as well, as much as they have been for me (they have been in this list for at least one time):

http://method-r.com/downloads/doc_download/11-managing-statistics-for-optimal-query-performance-karen-morton

http://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-be-careful/

HTH
--
Mit freundlichem Gruß

Martin Klier
Senior Oracle Database Administrator


Klug GmbH integrierte Systeme
Lindenweg 13, D-92552 Teunz
Tel.: +49 9671/9216-245
Fax.: +49 9671/9216-112
mailto: martin.klier_at_klug-is.de
www.klug-is.de


Geschäftsführer: Johann Klug, Roman Sorgenfrei Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608, HRB Nr. 2037, Amtsgericht Amberg

Neeraj Bhatia <neeraj.dba_at_gmail.com> schrieb am 06.04.2009 15:56:10:

> Von:
>
> Neeraj Bhatia <neeraj.dba_at_gmail.com>
>
> An:
>
> Martin Klier <Martin.Klier_at_klug-is.de>
>
> Kopie:
>
> oracle-l_at_freelists.org, oracle-l-bounce_at_freelists.org
>
> Datum:
>
> 06.04.2009 15:57
>
> Betreff:
>
> Re: Density statistic calculation in case of histogram
>
> Hi Martin,
>
> Version: 10.2.0.1.0
>
> After reading an excellent paper "Histograms - Myths and Facts" by
> Wolfgang Breitling, some points get cleared.
>
> 1)  In case of width-based histograms, density= 1/ (2*number of
> distinct values)
>
> Thus come the figure 0.00005
>
> 2) Without histograms, density should be equal to 1/number of
> distinct values. But in my case, it is null. I am not using
> method_opt while collecting statistics. The global default setting
> is FOR ALL INDEXED COLUMNS SIZE 1.
>
> 3) In case of height-based histograms, I didn't get these calculations.
>
> density =
> Σ cnt2 / ( num_rows˜ * Σ cnt )
> “the sum of the squared frequencies of all non-popular values
> divided by the sum of the frequencies of all non-popular values
> times the count of rows with not null values of the histogram column”.
>
> Any comments will be appreciated.
>
> Neeraj.
> On Mon, Apr 6, 2009 at 5:34 PM, Martin Klier <Martin.Klier_at_klug-is.de>
wrote:
> Hi Neeraj,
>
> no straight answer to your question, but on which DB version are you
> testing?
>
> Keep the estimate_percent default values in mind. Maybe if you force
> dbms_stats to compute stats, the result will differ.
>
> --
> Mit freundlichem Gruß
>
>
> Martin Klier
> Senior Oracle Database Administrator
>


>
> Klug GmbH integrierte Systeme
> Lindenweg 13, D-92552 Teunz
> Tel.:  +49 9671/9216-245
> Fax.: +49 9671/9216-112
> mailto: martin.klier_at_klug-is.de
> www.klug-is.de
>


>
> Geschäftsführer: Johann Klug, Roman Sorgenfrei
> Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608,
> HRB Nr. 2037, Amtsgericht Amberg
>
> oracle-l-bounce_at_freelists.org schrieb am 06.04.2009 08:47:30:
>
> > Von:
> >
> > Neeraj Bhatia <neeraj.dba_at_gmail.com>
> >
> > An:
> >
> > oracle-l_at_freelists.org
> >
> > Datum:
> >
> > 06.04.2009 08:48
> >
> > Betreff:
> >
> > Density statistic calculation in case of histogram
> >
> > Gesendet von:
> >
> > oracle-l-bounce_at_freelists.org
> >
> > Hi,
> > Please tell me how density is calculated in case of histograms.
> > I've a table with very skewed data:
> > scott_at_ORADB10G> select col_skew, count(*) from tab_skew group by
> > col_skew order by 1;
> >   COL_SKEW   COUNT(*)
> > ---------- ----------
> >          1         10
> >          2         10
> >          3         10
> >          4         10
> >          5         10
> >          6         10
> >          7         10
> >          8         10
> >          9         10
> >         10       9910
> > 10 rows selected.
> > scott_at_ORADB10G> exec dbms_stats.delete_table_stats(user,'TAB_SKEW');
> > PL/SQL procedure successfully completed.
> > -- Creating width-based histograms, #buckets = #distinct values
> > scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > method_opt=>'FOR COLUMNS COL_SKEW size 10');
> > PL/SQL procedure successfully completed.
> > scott_at_ORADB10G> select * from dba_tab_col_statistics where
> > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > OWNER  TABLE_NA COLUMN_NAME          NUM_DISTINCT LOW_VA HIGH_V
> > DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > AVG_COL_LEN HISTOGRAM
> > ------ -------- -------------------- ------------ ------ ------
> > ---------- ---------- ----------- --------- ----------- --- ---
> > ----------- ---------------
> > SCOTT  TAB_SKEW COL_SKEW                       10 C102   C10B
> > .00005          0          10 06-APR-09       10000 YES
> > NO            3 FREQUENCY
> > Question: How density is calculated here? Also it's equal to
> > Selectivity of non-popular values.
> > scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > method_opt=>'FOR COLUMNS COL_SKEW size 5');
> > PL/SQL procedure successfully completed.
> > -- Creating Height-based histograms, #buckets < #distinct values
> > scott_at_ORADB10G> select * from dba_tab_col_statistics where
> > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > OWNER  TABLE_NA COLUMN_NAME          NUM_DISTINCT LOW_VA HIGH_V
> > DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > AVG_COL_LEN HISTOGRAM
> > ------ -------- -------------------- ------------ ------ ------
> > ---------- ---------- ----------- --------- ----------- --- ---
> > ----------- ---------------
> > SCOTT  TAB_SKEW COL_SKEW                       10 C102   C10B
> > .98209          0           5 06-APR-09       10000 YES
> > NO            3 HEIGHT BALANCED
> > Question: How density is calculated here and how selectivity will be
> > estimated for popolar as well as non-popular values?
> >
> > Any points/advice will be appreciated.
> > Neeraj.
>

i0zX+n{+i^ Received on Mon Apr 06 2009 - 09:41:56 CDT

Original text of this message