Re: IOT or Cluster or neither

From: Aditya Alurkar <aalurkar_at_linkedin.com>
Date: Fri, 18 Aug 2006 16:41:35 -0700
Message-ID: <C10B9E3F.23A6%aalurkar@linkedin.com>

On 8/18/06 4:32 PM, "Christian Antognini" <Christian.Antognini_at_trivadis.com> wrote:

Thank you for your comments..

> Adi
>
>> * Large number of records >200million rows
>> * Tables have a unique "id" for each row (generated using sequence).
>> * The rows can be logically grouped per "user_id"
>> * The rows are primarily accessed per "user_id"
>> * The distribution of the rows is not even among "user_id" i.e. Most
>> users
>> have <50 rows but there are a few who have >10000 rows.
>> * The tables need to have some secondary indexes.
>> * The tables have "minimal" updates and "moderate" inserts.
>>
>> In order to increase the performance I have the following options.
>> A) IOT with primary key similar to "userid,id"
>
> IOT makes only sense if most accesses are based on the PK. Since in your
> case most accesses are performed through "user_id", it makes no sense to
> use an IOT.
>

If my PK is user_id,id wont the index be used for queries of type "where user_id=<value>" ? My understanding was that since the PK starts with user_id that it would use the PK index.

>> B) Cluster the table on user_id (index)
>> C) Cluster the table on user_id(hash)
>
> Clusters don't support partitioning. With that amount of data, you need
> partitioning.
>
>
> Therefore, IMHO, your only choice is finding the right partitioning
> schema. Probably the most important matter to consider are the
> definition of the PK and the number of distinct values and definition
> (ranges?) for "user_id"
>

I could partition using hash of user_id but this does not solve my problem, i.e. Wanting to "group" rows for a user together to reduce IO. With partitioning for users who have larger number of rows the partition would not only be large but within the partition the rows for a user would not necessarily live together on disk. Also if I was to partition using hash on user_id I would not be able to dictate that the users with more rows should not end up on the same partition. And partitioning on "key" (user_id) where the distribution of the data over the entire range of the key is likely to change may be a tough one to plan for.

>
> HTH
> Chris

-- 
Adi Alurkar
aalurkar_at_linkedin.com
--
http://www.freelists.org/webpage/oracle-l

Received on Fri Aug 18 2006 - 18:41:35 CDT