Re: CBO histograms & density

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Fri, 15 Nov 2002 10:26:34 -0000
Message-ID: <ar2i63$mif$1$8302bc10@news.demon.co.uk>
Notes in-line
--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA__________November 7/9   (Detroit)
____USA__________November 19/21 (Dallas)
____England______November 12/14

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html





Stephan Bressler wrote in message ...


>Hi Gurus,

>

>I've got a tricky problem on 9iR2 / HPUX.

>

>The statement is:

>select * from table where object_id=1234

>

>The table contains 50.000 rows in 900 blocks. There's an index on

object_id


>with ~26.000 distict keys. object_id is a number column.

>


So this is a single column index with an average of 2 rows per value -
are null values allowed, and are there any very popular values ?




>We analyze our schema using

>    dbms_stasts.gather_table_stats( .., estimate=>false,

>        method_opt=>'for all indexes for all indexed columns size

16',


>cascade=>true)

>(BTW: the same happens if using "analyze schema ...")

>



In most cases, you do not need histograms to get a good
optimisation path.  If you do need histograms, you probably
need more that 16 buckets.




>After that, the query is done by a full table scan (FTS). I believe

the


>problem to be the information dba_tab_cols.density, which contains

0,011. In


>turn the optimizer believes to get 0,011*50000 = 600  rows for a

given


>object_id by an index range scan.



The density is derived by building the histogram, estimating the
number of
distinct values, the number of nulls, identifying popular values,
factoring popular
values out of the total data set, and then dividing the estimated
number of remaining
rows by the number of distinct values, allowing for nulls.  I'm having
trouble working
out how the data you have described (with 16 buckets) could end up
setting this
value to 0.011




>This seems to be 3 times more expensive than the FTS (which, of

course, is


>wrong!).

>

>The value of density is dependent on the number of bucket named in

the

Generally to be expected - although sometimes luck plays a hand.  The
more
buckets you have, the more detail Oracle acquires about popular, very
popular
and quite popular values, and the better the detail about the spread
of non-popular
values.



>analyze statement. The more bucket, the better the value. A value of

200


>buckets produce a density value of  2,1e-4, while the accurate value

of


>1/26.000=3,8e-5 is produced only if the "for all indexed columns" is

omitted


>or if a bucket size of 1 is given.

>If density is close to the correct value, the optimizer chooses a

"good"


>(=index) plan.

>

>My questions are:

>1. If there is histogram information available, why is the column

overall


>density used at all?



It isn't - your query should be using the enpoint_values that bound
the
input literal.  The density is there for the benefit of bind-variable
queries.
(This may explain tablescan, but not the density).



>2. What does dba_tab_cols.density mean in the presence of histograms?

>


It is an indicator of the selectivity of non-popular values.




>TIA

>Stephan

>

>

>

>

>
Received on Fri Nov 15 2002 - 04:26:34 CST