Re: Index clustering factor

From: D.Y. <dyou98_at_aol.com>
Date: 20 Jun 2002 09:43:39 -0700
Message-ID: <f369a0eb.0206200843.5680d099@posting.google.com>

ashok_jayaraman_at_yahoo.com (Jayaraman Ashok) wrote in message news:<7ca2852.0206192223.555f04d9_at_posting.google.com>...
> Hi
> Can somebody please explain in a lucid manner, what clustering_factor
> means in dba_indexes? I have read that if the value is nearer to the
> number of blocks occupied (or is it allocated?) by the table, it means
> the data in the table is very well ordered and if the value is nearer
> to the number of rows in the table then the data is scattered. Does it
> mean that the clustering_factor is something like a pointer? A pointer
> for each block if data is proper in the table and a pointer for each
> row when data becomes scattered. If the table is going to grow at a
> decent rate then won't the clustering_factor again go for a toss after
> reorganization? Also should the reorganization of the table always be
> based on the most used index for that table , for the exercise to be
> effective? Finally, is it worth paying attention to this aspect?
>

I think clustering factor is meant to be equal to the number*times physical blocks
in a table need to be read in via physical I/Os when you access the entire table
using index full scan. And the calculation assumes no caching.

If your index and table are perfectly aligned, as you read an index key the associated
row in the table is always right next to the row you just read unless you happen to be
crossing into another block. What this means is each block only needs to be brought in
once. Oracle sequentially reads through all the rows in the block. When this is done
this block is never needed again. So the number of physical I/O's is equal to the number
of blocks in the table, and that is the clustering factor.

On the other hand, if the table is randomly aligned with respect to the index, index
keys in an index block point to rows that are scattered all over the table segment. A
different data block needs to be visited for each index key (not each index block).
With no caching, as many physical I/O's as the number of rows may be needed to access
the entire table. So the calculated clustering factor is very high here. In a real
system, there is always some level of caching. You almost certainly don't need an I/O
for each row. In fact not even close to that.

This statistics is one of the factors the optimizer considers when choosing an access
path. If the value is very high it's considered a bad index, and vice versa. As far as
performance, I normally don't see a big improvement by tuning clustering factor, not
more than a few percent. It could be because we have so much cache these days. You could
get more significant performance improvement from a well aligned table if you have large
tables and relatively small cache, or if you have multiple batches running forcing
scattered blocks to be flushed out more quickly.

> Cheers,
> Ashok
Received on Thu Jun 20 2002 - 11:43:39 CDT