Re: multi-column indexes, statistics and selectivity

From: Mladen Gogala <no_at_email.here.invalid>
Date: Fri, 4 Dec 2009 22:21:09 +0000 (UTC)
Message-ID: <pan.2009.12.04.22.21.09_at_email.here.invalid>



On Fri, 04 Dec 2009 16:48:43 +0000, bugbear wrote:

> Can anyone tell me (or point me at information) about how statistics are
> generated/used for multi-column indexes?
>
> We have a 2 column status model (major and minor status, if you like).
> This is key to our application.
>
> "Naturally" we have put a 2 column index on these two columns.
>
> But we have "doubts" about whether the CBO is always doing as good as
> job as we'd like.
>
> The key question seems to be:
>
> Are the columns "treated as a pair" so that the frequency estimate for a
> pair of statuses depends on both of them or...
>
> Are the columns treated in isolation
> so that the frequency estimate
> is simply obtained by getting the estimate for each column, and
> combining them mathematically.
>
> This would make a big difference in our app, since our major/minor
> status (in practice) have interesting and complex correlations.
>
> BugBear

10g doesn't do multi-column histograms. 10g will create histograms for every single column of the index and will estimate the number of entries retrieved for each column by using those histograms. After that, it will calculate the selectivity of the conditions by multiplying the selectivities for every single column. Selectivity is the estimated number of retrieved values divided by the total number of values in that column. When the CBO gets the selectivity, it will multiply it by the index clustering factor to estimate the number of blocks that need to be retrieved. The correlation between the number of blocks and the price of the query is known only to the God and Jonathan, but there definitely is one.

-- 
http://mgogala.byethost5.com
Received on Fri Dec 04 2009 - 16:21:09 CST

Original text of this message