Re: CBO histograms & density

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Fri, 15 Nov 2002 15:06:33 -0000
Message-ID: <ar32is$jso$1$8300dec7@news.demon.co.uk>
Notes inline.
--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA__________November 7/9   (Detroit)
____USA__________November 19/21 (Dallas)
____England______November 12/14

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html





Stephan Bressler wrote in message ...


>

>I did some further investigations. It seems, that the density

information


>changes its meaning in the presence of histograms on a given table.

>



That's what I said in my original posting.
<quote>
The density is derived by building the histogram,
estimating the number of  distinct values, the number
of nulls, identifying popular values, factoring popular
values out of the total data set, and then dividing the
estimated number of remaining rows by the number
of distinct values, allowing for nulls.
</quote>

I guess the point that I didn't make clearly is that
Oracle's understanding of 'popular values' is based
on the histogram results.




>Back to my original question:

>The column ean with a "good" density value in any case has max. 2

rows per


>key, while object_id has up to 3000 per key. This seems to make the

>difference.


Should do - this is a "popular value", and will
therefore have some effect on the calculated
density.  Nevertheless, you have 50,000 rows
and 25,000 distinct values.  Call this 48,000
and 24,000 for convenience.  Even if half the
rows fell into 8 values at 3,000 rows per value
I don't think the density should be as high as
it is.

Note - to build your histogram of 16 buckets,
Oracle takes a data sample, sorts it into
order, and then checks the value every
one-sixteenth of the way down the sorted
list (i.e. roughly every 3,000 rows - that's an
interesting number).


Your stats have just come through - I would
certainly expect to see a different cost if
you gave an index hint to
    id = 5000000

SB XXX object_id 4 4110683
SB XXX object_id 5 6196801        gap of 2M

rather than
    id = 19897000            gap of 800

SB XXX object_id 12 19896336
SB XXX object_id 13 19907124


I can't see any clue why the density should
be so high !


I am now very curious - I thought I had this
bit of the optimiser cracked, but obviously
I've missed something.

If you want to experiment, you could clone
the table and just try:
    analyze table copytab
    compute statistics for column xxx
    size 16

to see if the stats look very different.

If you are allowed to, I would be interested
in playing with the data set - if you want to
'create table x as select /*+ full */ guilty_column'
then export it and email me direct.

let me know the avg_row_len and I'll import
the data and hack it to the right size and
see what I can make of it.
Received on Fri Nov 15 2002 - 09:06:33 CST