Re: Hash cluster, parameter SIZE

From: Jonathan Lewis <>
Date: Sat, 28 Nov 2009 13:44:42 -0000
Message-ID: <>

"abracadabuda" <> wrote in message
>> 32k is largest block size that oracle supports ...
> Thanks, i got it now.
> I misreaded this:
> If SIZE is larger than the data block size, then the database uses the
> operating system block size, reserving at least one data block for
> each cluster or hash value.
> I haven't mentioned, that the size of my cluster was +/- 2000003 *
> 8KB. My os_block_size is set to 1K, db_block_size is set to 8K.
> Therefore the largest real accepted SIZE was 8KB.
> But anyway, can you recommend me any other solution, how to store such
> amount of data of one customer close one to the other? Partitioning is
> not a good idea...
> Thanks a lot.
> abracadabuda

Why is partitioning not a good idea ? (Apart from the license fees) Technically you haven't said anything that makes it unsuitable, and if you're keeping your data for 24 months it's wise to consider what you're going to do to get rid of the old data after 25 months.

Having said that, you could consider storing the data as an IOT (index organised table). Data for one customer for one month is about 6,000 bytes, which means roughly one block, so for 24 months it would be something like 24 blocks - except you'll have to add a few because the index will probably run at about 50% utilisation.

I assume that you will have a primary key of

    (customer_id, transaction_date, sequence_number) where the sequence number exists to allow for two transactions for a customer on a given date. Create the primary with compression on the first column.

This is a case where you could argue for a larger block size - especially if your queries always range over more than one month - as this would allow a single block I/O to fetch a larger chunk of history. On the other hand, if most of your queries are just for the last couple of weeks, you might as well stay with the typical 8KB.

An alternative would be an index cluster - clustered on customer id, but with an extra index on (customer_id, transaction_date). The clustering would pull data into a small number of (scattered) blocks, time-dependency would pack each customer's data in a time-related fashion, and the second index would allow you to pick data for a small date range for a single customer without having to visit all the blocks for that customer.


Jonathan Lewis
Received on Sat Nov 28 2009 - 07:44:42 CST

Original text of this message