Density statistic calculation in case of histogram

From: Neeraj Bhatia <neeraj.dba_at_gmail.com>
Date: Mon, 6 Apr 2009 12:17:30 +0530
Message-ID: <a8fd4d730904052347h1d4a1cdcjb130283a6116006f_at_mail.gmail.com>



Hi,

Please tell me how density is calculated in case of histograms.

I've a table with very skewed data:

scott_at_ORADB10G> select col_skew, count(*) from tab_skew group by col_skew order by 1;

  COL_SKEW COUNT(*)
---------- ----------

         1         10
         2         10
         3         10
         4         10
         5         10
         6         10
         7         10
         8         10
         9         10
        10       9910

10 rows selected.

scott_at_ORADB10G> exec dbms_stats.delete_table_stats(user,'TAB_SKEW');

PL/SQL procedure successfully completed.

  • Creating width-based histograms, #buckets = #distinct values scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW', method_opt=>'FOR COLUMNS COL_SKEW size 10');

PL/SQL procedure successfully completed.

scott_at_ORADB10G> select * from dba_tab_col_statistics where table_name='TAB_SKEW' and column_name='COL_SKEW';

OWNER TABLE_NA COLUMN_NAME NUM_DISTINCT LOW_VA HIGH_V DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE AVG_COL_LEN HISTOGRAM

------ -------- -------------------- ------------ ------ ------ ----------
---------- ----------- --------- ----------- --- --- -----------
SCOTT  TAB_SKEW COL_SKEW                       10 C102   C10B

.00005 0 10 06-APR-09 10000 YES NO 3
FREQUENCY Question: How density is calculated here? Also it's equal to Selectivity of non-popular values.

scott_at_ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW', method_opt=>'FOR COLUMNS COL_SKEW size 5');

PL/SQL procedure successfully completed.

  • Creating Height-based histograms, #buckets < #distinct values scott_at_ORADB10G> select * from dba_tab_col_statistics where table_name='TAB_SKEW' and column_name='COL_SKEW';

OWNER TABLE_NA COLUMN_NAME NUM_DISTINCT LOW_VA HIGH_V DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE AVG_COL_LEN HISTOGRAM

------ -------- -------------------- ------------ ------ ------ ----------
---------- ----------- --------- ----------- --- --- -----------
SCOTT  TAB_SKEW COL_SKEW                       10 C102   C10B

.98209 0 5 06-APR-09 10000 YES NO 3
HEIGHT BALANCED Question: How density is calculated here and how selectivity will be estimated for popolar as well as non-popular values?

Any points/advice will be appreciated.

Neeraj.

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 06 2009 - 01:47:30 CDT

Original text of this message