Re: clustering factor and uniqueness

From: Richard Foote <richard.foote_at_bigpond.nospam.com>
Date: Thu, 30 Mar 2006 13:12:08 GMT
Message-ID: <IOQWf.20262$dy4.3201@news-server.bigpond.net.au>

<yoram.ayalon_at_structuredweb.com> wrote in message news:1143668115.350817.120590_at_g10g2000cwb.googlegroups.com...
>I feel so stupid... I was using number instead of varchar2. now its
> working MUCH faster
>
> but what does this say about my original question on clustering factor
> and uniqueness? what is deemed the best practice in terms of type of
> index to use, bitmap or btree, and the need to rebuild table?
>

Hi Yoram,

The clustering factor is simply used by the cost based optimizer to calculate the cost of accessing the table via the use of the specific index.

Simple example:

A table has 50,000 rows that fit in 50 data blocks (1000 rows per block)

A query is calculated by the CBO to access 10% (or 5,000) of rows in this table.

The cost of accessing the table via the index is therefore (roughly) the I/O in reading the index itself (in this example assuming a 2 level index, 1 I/O for the index root block + 10% of index leaf blocks) + the cost of reading the table.

The cost of reading the table is simply 10% of the clustering factor.

In the case of a "really good" CF of 50 (i.e. close to the blocks in the table meaning the data is pretty well sorted in the same manner as the index) that would by 10% of 50 = 5 blocks.

In the case of a "really bad" CF of 50,000 (i.e. close to the number of rows in the table meaning the data is scattered all over the place in relation to the index and each read of the table takes us to a different table block from the previous read) that would be 10% of 50,000 = 5,000 blocks.

In the case of "an average" CF of 500, that would be 10% of 500 or 50 blocks (meaning on average having 100 rows of interest in each and every block).

Therefore the CBO is much more likely to use the index in the first example and not so likely to in the second or third examples.

In the case of a single lookup via a PK, the CF is redundant as we only need to access the one table block. As Xho suggested, the CF becomes a bigger issue/differentiator as the cardinality of the query against the table increases.

Hopes this makes sense and helps.

Cheers

Richard Received on Thu Mar 30 2006 - 07:12:08 CST