Re: sane number of the table partitions in DWH

From: Greg Rahn <>
Date: Tue, 31 Mar 2009 09:47:42 -0700
Message-ID: <>

My first comments would be if you are going to have an extreme design, then your testing should also be extreme, in both scale and depth.

> The number of calculated/estimated partitions is 200 000 - 300 000 (Yep, this is NOT a typing error !).

Is this total for the db or per table?

> 1) What is the "manageable" number of partitions (from performance point of view == parse times) that I you have seen by clients.

The number of partitions does affect parsing, but it also has an impact on the buffer cache. If the segment header is not in the buffer cache the execution will take longer than if it was (slightly obvious). The not obvious part of this is that if you are using parallel query, the query coordinator reads the extent maps serially so there can be a long time between start of query and when the PQ slaves start their work, unless you are on or have bug 6525904 fix as a one-off. To put this in perspective, for a FTS of a table of 92,160 segments I have seen this "stall" take 30 minutes prior to the bug fix.

> 2) Any parameters (also underscore ...) to reduce the hard-parse times (because of the enormous amount of partitions) ?

Set your buffer cache large enough to keep all the segment headers in it.

> 3) If I have so many partitions , what is the influence on the fragmentation of the shared pool? Any experiences ?

I don't think this is really an issue. Warehouses usually hard parse everything, use it once, and it ages out fast, so there is likely little need for more than a few GB of shared pool memory.

> 4) Is multi-column partitioning  a save bet (bugs) ?

Multi-column partitioning can work, however it behaves different when it comes to partition elimination based on the type of partitioning and how many of the partitioning keys are present in the predicate in the query. Test this well noting partition elimination before deciding on a design.

> Fact tables have sizes ~ 700 - 1000 GB.
> Application type = DWH ( ~ 4.5 TB)
> At the moment partitioning schmeme is range(date)/hash(location_id)

Based on these numbers I don't really see the benefit for the large number of partitions. A 4TB table with 200K partitions means 20MB partitions (assuming even distribution). That is quite small for a partition size. I would say too small to warrant significant benefit.  And since you mention the fact tables are up to 1TB, well, that means single MB or smaller for each partition segment. That is probably too small to be overly useful.

I would suggest range(date) with a granule size of 1 day because that tends to be the unit that business reports frequently use. Unless you are either 1) doing fact-fact joins and want to leverage partition-wise joins or 2) single key access via location_id and not using indexes, then I would see little benefit from the hash subpartitioning at that scale. Of course, this is based off very little provided information, so take it with a grain of salt.

Greg Rahn
Received on Tue Mar 31 2009 - 11:47:42 CDT

Original text of this message