Re: partitioning and execution plans?

From: gdas <gdas1NOgdSPAM_at_yahoo.com.invalid>
Date: 2000/08/13
Message-ID: <0becf590.960f0885@usw-ex0102-015.remarq.com>

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
>
>We really need an idea of the total volume
>of data and the variation in data volumes per
>account.

Right now the largest account has 500,000 rows. The newest account has 0 rows and the smallest non-zero account has 100 rows.
>
>One starting thought - you say you have partitioned
>over 10 tablespaces, does this also mean 10 partitions ?
>You should always hash partition over power(2,N)
>partitions to maximise you chances of equal sized
>partitions.
>

I don't quite understand your formula? power(2,N)???? huh? and yes, I have 10 partitions, 1 partition per tablespace.

>In general, 10 partitions will lead to 4 small and 6 large
>partitions, where the large partitions are twice the
>size of the small partitions.
>
>Can you please let me know about the formula you have for
calculating partions?

>If you partition by hash (account_id), then any query
>of the form:
> account_id = {constant}
>will automatically eliminate all but one partition, whether
>or not the account exists at all - Oracle will simply access
>the partition where it would be if it existed.
>
>Your best bet for getting the best access path all the
>time to analyze the index for the column account_id
>to get histograms generated.

I have tried this...and have had the same results or worse results as when I was generating table statistics.

>
>One thought, however, for handling historical account
>information - you might consider composite partitioning -
> range partitioning by date, and hash subpartitioning
>by account_id.
>
>Unfortunately, I did not give enough info previously. I can't
explain the real intent of the application. I can, but I'm not supposed to due to disclosure contracts. The historical data is not date-based. It is more like an account acquires a characteristic, and as time goes by an account acquires more and more characteristics. When the account acquired a characteristic is not important. What's most important is all the characteristics that have been aquired by an account.

The table in question looks like this:

characteristic_id number,
account_id number,
reference_id number,
characteristic_type varchar2(100),
value varchar2(100)

characteristic_id is the primary key
reference_id is the foreign key that is used to join the table to other tables.
account_id is the partitioning column.

Logically, an account has many references (each account has anywhere from 1-100 references). Each reference has 1- 1000 characteristics per account.

First of all, This is a developement database and I am a new DBA. I am actually an app developer trying to move into the role of a dba...so I am experimenting a lot.

As you might be able to tell, I denormalized the schema by adding account_id to the characteristic table. The only reason I did this was specifically to take advantage of partitioning.

Originally the table was normalized without the account_id. We found data growing incredibly though. For just 10 accounts, the table was accruing almost 1 million records every 2 weeks. I new it was going to be a HUGE table, so I thought I partitioning would help...and account_id seemed the most logical choice.

I'm not sure what to try next. I'd appreciate any tips.
>
>--
>
>Jonathan Lewis
>Yet another Oracle-related web site:

http://www.jlcomp.demon.co.uk
>
>gdas wrote in message <004e2dc6.f97dbd35_at_usw-ex0101-
006.remarq.com>...
>>I found an interesting issue today...
>>
>>I have a large table that is hash partitioned across 10
>>tablespaces by an account_id. This was done because a user
will
>>always be viewing information for their own account. However,
>>the information for each account can have different amounts of
>>data and thus varied cardinalities.
>>
>>I've experimenting with analyzing this table with different
>>levels of sampling. I've even done the same for all the
>>individual partitions.
>>
>>The problem is that the execution plan is highly data
>>dependent. An execution plan for one account that works well,
>>doesn't work well for another account if the amount of data is
>>greatly varied.
>>
>>Additionally, if an account does not yet exist in this
>>table...an incredibly slow query is sent out which I have to
>>kill after 2 minutes. This table is tracking history. And new
>>accounts have no history yet... accounts that do have history
>>come back in anywhere from 2 - 60 seconds (due to problem
above).
>>
>>I can logically deduce that since there is no record for the
>>account in the history table yet, that there is no hash bucket
>>yet assigned. So oracle might be doing a full table scan here.
>>(I do have local indexes created for the account_id).
>>Also, I did an explain plan on a simple query for an account
>>that did not exist yet "Select count(*) from table where
>>account_id = n "...to see what partitions were being scanned.
>>The output showed a start and stop on just a single partition.
>>(Why it is scanning just one partition I don't know...but I
>>suppose if it scanned all 10, the query would be ten times as
>>slower).
>>
>>For manageability, I don't want to have to create 1 partition
>>per account.
>>
>>
>>Is there a better way to gather statistics in order to account
>>for the variances in cardinalities for individual accounts that
>>exist within a single partition.
>>
>>Any ideas?
>>
>>
>>
>>-----------------------------------------------------------
>>
>>Got questions? Get answers over the phone at Keen.com.
>>Up to 100 minutes free!
>>http://www.keen.com
>>
>>
>
>
>
>

Got questions? Get answers over the phone at Keen.com. Up to 100 minutes free!
http://www.keen.com Received on Sun Aug 13 2000 - 00:00:00 CDT