Re: clustering factor and partitioned tables

From: Tim Gorman <tim_at_evdbt.com>
Date: Tue, 26 Jun 2007 16:45:08 -0600
Message-ID: <468196F4.603@evdbt.com>

Mr. DBA,

The "clustering factor" of an index does not influence a table; rather it is the other way around. The distribution of column values in the rows of the table influences use of the index, particularly during a "range scan".

A good example is the standard demo EMP table -- let's imagine that it has only two indexes, one on the EMPID column and the other on the ENAME column.

When rows are inserted into the EMP table, a sequence is used to assign values to the EMPID column, but the values populating the ENAME column are as random as the company's hiring practices are. In other words, you can count on a great "clustering factor" in EMPID, because rows are inserted adjacent to one another, usually within the same database block, spilling over into the "next" database block, with sequentially ascending values. However, you cannot say the same for the ENAME column, because "ADAMS" could be hired right after "ZACHARY" who was hired right after "MONROE", etc. So, if someone is scanning the table using the index on EMPID, looking for employees with EMPIDs between 100 and 200, then all of those rows are probably in 1-2 database blocks (i.e. great clustering factor). However, if someone is scanning the table using the index on ENAME, looking for employees whose last names start with the letter "S", then those rows could be in any of the several hundred database blocks that might comprise the entire table (i.e. terrible clustering factor).

The sad irony that goes with this example of the EMP table is that the EMPID column is usually used for UNIQUE SCANs, not RANGE SCANs. That is, applications probably don't search for employees by scanning ranges of values on EMPID. On the other hand, applications probably do scan for employee records more often based on ranges of employee name (i.e. looking for all employees whose names begin with "SMI%"). So, the excellent clustering factor on the index on EMPID generally "goes to waste", while the terrible clustering factor on the index on ENAME gets used fairly heavily. C'est la vie...

A global index could be adversely affected by the partitioning on a table, because the partitioning scheme might cause further scattering of rows across partitions than might otherwise "cluster" together within the same database block(s). This adverse impact probably increases with the number of partitions, though again it is heavily dependent on the behavior of the application with respect to how rows are inserted (i.e. in what order) and what the partition-key column is. If the partition-key column is somehow time-based or somehow temporally correlated to the indexed column, then there likely won't be much impact on the efficiency of the global index due to the partitioning. However, if the partition-key column is not in any way correlated to the values of the indexed column, then it is possible that partitioning could make the "clustering factor" even worse than would have been the case on a non-partitioned table.

Hope this helps?

Tim Gorman
consultant - Evergreen Database Technologies, Inc. P.O. Box 1802, Evergreen CO 80437-1802

website = http://www.evdbt.com
email   = tim_at_evdbt.com
mobile  = +1-303-885-4526
fax     = +1-303-484-3608

Joe Smith wrote:
> I am reading Tom Kyte's book and on page 444 he deals with clustering
> factor. I have a partitioned table with a global index. Does anybody
> have information on how the clustering factor influences a partitioned
> table with global index?
>
> thanks.
>
> _________________________________________________________________
> Get a preview of Live Earth, the hottest event this summer - only on
> MSN http://liveearth.msn.com?source=msntaglineliveearthhm
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Jun 26 2007 - 17:45:08 CDT