Re: multi-column indexes, statistics and selectivity

From: Robert Klemme <shortcutter_at_googlemail.com>
Date: Sat, 05 Dec 2009 14:18:38 +0100
Message-ID: <7nv4teF3nou04U1_at_mid.individual.net>



On 12/04/2009 11:21 PM, Mladen Gogala wrote:
> On Fri, 04 Dec 2009 16:48:43 +0000, bugbear wrote:
> 

>> Can anyone tell me (or point me at information) about how statistics are
>> generated/used for multi-column indexes?
>>
>> We have a 2 column status model (major and minor status, if you like).
>> This is key to our application.
>>
>> "Naturally" we have put a 2 column index on these two columns.
>>
>> But we have "doubts" about whether the CBO is always doing as good as
>> job as we'd like.
>>
>> The key question seems to be:
>>
>> Are the columns "treated as a pair" so that the frequency estimate for a
>> pair of statuses depends on both of them or...
>>
>> Are the columns treated in isolation
>> so that the frequency estimate
>> is simply obtained by getting the estimate for each column, and
>> combining them mathematically.
>>
>> This would make a big difference in our app, since our major/minor
>> status (in practice) have interesting and complex correlations.
> 
> 10g doesn't do multi-column histograms. 10g will create histograms for 
> every single column of the index and will estimate the number of entries 
> retrieved for each column by using those histograms. After that, it will 
> calculate the selectivity of the conditions by multiplying the 
> selectivities for every single column. Selectivity is the estimated 
> number of retrieved values divided by the total number of values in that 
> column. When the CBO gets the selectivity, it will multiply it by the 
> index clustering factor to estimate the number of blocks that need to be 
> retrieved. The correlation between the number of blocks and the price of 
> the query is known only to the God and Jonathan, but there definitely is 
> one.

Are there cases where it makes sense to use a FBI to get "multi column" histogram information? Of course, queries then would also have to use that concatenated value as query criteria which makes usage of this quite nasty (especially if the SQL is generated by some kind of persistence container).

Cheers

        robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
Received on Sat Dec 05 2009 - 07:18:38 CST

Original text of this message