Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Density calculation. Was: Incorrect cardinality estimate

Density calculation. Was: Incorrect cardinality estimate

From: Laimutis Nedzinskas <Laimutis.Nedzinskas_at_landsbanki.is>
Date: Wed, 24 Jan 2007 16:26:20 -0000
Message-ID: <5A8896FB2AFC5445A7DCFC5903CCA6B06FEC72@W03856.li01r1d.lais.net>


-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Wolfgang Breitling
>Ergo, in order to get the cardinality estimate down you have to change
>either num_rows or density. I suggest you whack density to
>2.5e-6 or even less. Because you did gather the histogram this won't
>affect any sql which go after popular values. It will affect sql where
>you have a join on the ods_process_date column.

It seems if Oracle is doing a natural thing when calculating density. If histogram is provided then Oracle tries to remove "noise" from density calculation.
A small test reveals that Oracle is able to effectively remove very popular values.
The formula is (kind of) following:

Density=Cardinality_for_Unpopular_Values/Number_of_Rows

For example:

create table test_dens
(id number
, nh number --number column with histogram , n number --number column w/o histogram );

---Case 3
insert into test_dens
select rownum, rownum, rownum from all_objects where rownum<=5000;

insert into test_dens
select rownum, 0, 0 from all_objects where rownum<=2500;

insert into test_dens
select rownum, 0.5, 0.5 from all_objects where rownum<=2500;

--
begin sys.dbms_stats.gather_table_stats('LTDLNE', 'TEST_DENS',
method_opt=>'FOR COLUMNS NH SIZE 254, N SIZE 1'); end;
/

select t.table_name, t.column_name, t.num_distinct, t.density,
t.num_nulls, t.num_buckets, t.histogram 
from user_tab_col_statistics  t where table_name='TEST_DENS';

Table		column	num_rows	density
num_nulls	buckets	histogram type	
TEST_DENS 	NH 		5002		0,0001
0		254		HEIGHT BALANCED
TEST_DENS 	N  		5002		0,000199920031987205
0		1		NONE

Density for column N is exactly 1/5002
Density for column NH is:

Density=Cardinality_for_Unpopular_Values/Number_of_Rows = 1/10000 =
0,0001
, where Cardinality_for_Unpopular_Values=1 
because we have 5002(total distinct values)-2(popular
values)=5000(unpopular distinct values) for 5000 unpopular rows yielding
cardinality 5000(unpopular rows)/5000(unpopular distinct values)=1

--- 
Another test with Cardinality_for_Unpopular_Values=2: 

insert into test_dens
select rownum, mod(rownum,1000), mod(rownum,1000) from all_objects where
rownum<=5000;

insert into test_dens
select rownum, 10001, 10001 from all_objects where rownum<=2500;

insert into test_dens
select rownum, 10002, 10002 from all_objects where rownum<=2500;

And we have:

Density_for_NH_columns is 0.0005 = (5000/(1002-2))/10000,
where  
5000 - unpopular rows, 
1002 - total distinct values, 
2 - popular rows
10000 - total rows

---

The problem is how Oracle cuts this noise. It may use several techniques
but all of them are heuristic anyway. A guess.



Fyrirvari/Disclaimer
http://www.landsbanki.is/disclaimer
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 24 2007 - 10:26:20 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US