Re: partitioning and execution plans?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: 2000/08/13
Message-ID: <966162923.15236.0.nnrp-14.9e984b29@news.demon.co.uk>

>I don't quite understand your formula? power(2,N)???? huh? and
>yes, I have 10 partitions, 1 partition per tablespace.
>

power(2,N) - the number of hash partition should be a power of 2. i.e. 2,4,8,16,32,...

>>In general, 10 partitions will lead to 4 small and 6 large
>>partitions, where the large partitions are twice the
>>size of the small partitions.
>>
>>Can you please let me know about the formula you have for
>calculating partions?

Formula for sizing - Oracle allows you to add or remove one hash partition at a time (split/coalesce) without having to rebuild the entire table. In your case, therefore, Oracle will treat 10 partitions as 8 + 2. Partitions 1 and 9 will be the same size as each other as will partitions 2 and 10. If you coalesce once, then 2 and 10 will collapse down to partition 2, do it again and 1 and 9 will collapse down into partition 1. Hence the comment about 6 large and 4 small.

>Right now the largest account has 500,000 rows. The newest
>account has 0 rows and the smallest non-zero account has 100
>rows.

Awkward - as you have discovered, Oracle is likely to switch back and forth between indexed paths and scans to deal with the two extremes. What indexes do you have at present ?

>Originally the table was normalized without the account_id. We
>found data growing incredibly though. For just 10 accounts, the
>table was accruing almost 1 million records every 2 weeks. I
>new it was going to be a HUGE table, so I thought I partitioning
>would help...and account_id seemed the most logical choice.
>

Hash partitioning really requires you to have a reasonably large number of different values for the partitioning column. Hashing for 'partition count = number of distinct values' is not a good setup, even experimentally. You are unlikely to get one account per partition.

>First of all, This is a developement database and I am a new
>DBA. I am actually an app developer trying to move into the
>role of a dba...so I am experimenting a lot.
>

This is a non-trivial problem, and you are addressing it with a new feature of Oracle. You may get lucky, but really no-one should ask a 'new DBA' to try to solve a problem like this. Get your management to call in a database architect for a few days. (I think I can say that without prejudice, since I'm based several thousand miles away and not likely to be a candidate for the task).

--

Jonathan Lewis
Yet another Oracle-related web site:  http://www.jlcomp.demon.co.uk

gdas wrote in message <0becf590.960f0885_at_usw-ex0102-015.remarq.com>...


>"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:

>>

>>We really need an idea of the total volume

>>of data and the variation in data volumes per

>>account.

 


>>

>>One starting thought - you say you have partitioned

>>over 10 tablespaces, does this also mean 10 partitions ?

>>You should always hash partition over power(2,N)

>>partitions to maximise you chances of equal sized

>>partitions.

>>

 


>>If you partition by hash (account_id), then any query

>>of the form:

>>    account_id = {constant}

>>will automatically eliminate all but one partition, whether

>>or not the account exists at all - Oracle will simply access

>>the partition where it would be if it existed.

>>

>>Your best bet for getting the best access path all the

>>time to analyze the index for the column account_id

>>to get histograms generated.

>

>I have tried this...and have had the same results or worse

>results as when I was generating table statistics.

>

>>

>>One thought, however, for handling historical account

>>information - you might consider composite partitioning -

>> range partitioning by date, and hash subpartitioning

>>by account_id.

>>

>>Unfortunately, I did not give enough info previously.  I can't

>explain the real intent of the application.  I can, but I'm not

>supposed to due to disclosure contracts.  The historical data is

>not date-based.  It is more like an account acquires a

>characteristic, and as time goes by an account acquires more and

>more characteristics.  When the account acquired a

>characteristic is not important.  What's most important is all

>the characteristics that have been aquired by an account.

>

>The table in question looks like this:

>

>characteristic_id number,

>account_id number,

>reference_id number,

>characteristic_type varchar2(100),

>value varchar2(100)

>

>characteristic_id is the primary key

>reference_id is the foreign key that is used to join the table

>to other tables.

>account_id is the partitioning column.

>

>Logically, an account has many references (each account has

>anywhere from 1-100 references).  Each reference has 1- 1000

>characteristics per account.

>

>

>First of all, This is a developement database and I am a new

>DBA.  I am actually an app developer trying to move into the

>role of a dba...so I am experimenting a lot.

>

>As you might be able to tell, I denormalized the schema by

>adding account_id to the characteristic table.  The only reason

>I did this was specifically to take advantage of partitioning.

>

>Originally the table was normalized without the account_id.  We

>found data growing incredibly though.  For just 10 accounts, the

>table was accruing almost 1 million records every 2 weeks.  I

>new it was going to be a HUGE table, so I thought I partitioning

>would help...and account_id seemed the most logical choice.

>

>I'm not sure what to try next.   I'd appreciate any tips.

>>

>>--

>>

>>Jonathan Lewis

>>Yet another Oracle-related web site:

 http://www.jlcomp.demon.co.uk


>>

>>gdas wrote in message <004e2dc6.f97dbd35_at_usw-ex0101-

 006.remarq.com>...


>>>I found an interesting issue today...

>>>

>>>I have a large table that is hash partitioned across 10

>>>tablespaces by an account_id.  This was done because a user

 will


>>>always be viewing information for their own account.  However,

>>>the information for each account can have different amounts of

>>>data and thus varied cardinalities.

>>>

>>>I've experimenting with analyzing this table with different

>>>levels of sampling.  I've even done the same for all the

>>>individual partitions.

>>>

>>>The problem is that the execution plan is highly data

>>>dependent.  An execution plan for one account that works well,

>>>doesn't work well for another account if the amount of data is

>>>greatly varied.

>>>

>>>Additionally, if an account does not yet exist in this

>>>table...an incredibly slow query is sent out which I have to

>>>kill after 2 minutes.  This table is tracking history.  And new

>>>accounts have no history yet... accounts that do have history

>>>come back in anywhere from 2 - 60 seconds (due to problem

 above).


>>>

>>>I can logically deduce that since there is no record for the

>>>account in the history table yet, that there is no hash bucket

>>>yet assigned.  So oracle might be doing a full table scan here.

>>>(I do have local indexes created for the account_id).

>>>Also, I did an explain plan on a simple query for an account

>>>that did not exist yet "Select count(*) from table where

>>>account_id = n  "...to see what partitions were being scanned.

>>>The output showed a start and stop on just a single partition.

>>>(Why it is scanning just one partition I don't know...but I

>>>suppose if it scanned all 10, the query would be ten times as

>>>slower).

>>>

>>>For manageability, I don't want to have to create 1 partition

>>>per account.

>>>

>>>

>>>Is there a better way to gather statistics in order to account

>>>for the variances in cardinalities for individual accounts that

>>>exist within a single partition.

>>>

>>>Any ideas?

>>>

>>>

>>>

>>>-----------------------------------------------------------

>>>

>>>Got questions?  Get answers over the phone at Keen.com.

>>>Up to 100 minutes free!

>>>http://www.keen.com

>>>

>>>

>>

>>

>>

>>

>

>

>

>-----------------------------------------------------------

>

>Got questions?  Get answers over the phone at Keen.com.

>Up to 100 minutes free!

>http://www.keen.com

>

Received on Sun Aug 13 2000 - 00:00:00 CDT