Re: sane number of the table partitions in DWH

From: Greg Rahn <greg_at_structureddata.org>
Date: Thu, 2 Apr 2009 15:28:34 -0700
Message-ID: <a9c093440904021528ye9cfde7jc4708defe560c6d2_at_mail.gmail.com>



I would agree with Riyaj's comments. Simply put, 10 million BITMAP CONVERSION TO ROWIDS is way too many to be well performing. Personally, I would probably draw the line at 250K as a reasonable number, before going back to revisit the physical design options.

I would also comment that parse time is probably the least of your worries. Personally I have never seen it be a significant issue in a warehouse. Do be aware of the segment header caching issue, but given a frequently used set of partitions and a large enough buffer cache and the bug fix, that should be a non-issue as well.

Lets talk some requirements and design:

> Data volatility :
> This is an operational DWH database. Data is almost steadily loaded (mainly INSERTs + UPDATEs).
> Bitmap indexes are getting bloated/corrupt and must be often rebuilded.
> Furthermore, data in the fact tables could not be preaggregated and/or compressed (because of steady data changes)

> Furthermore, data in the fact tables could not be preaggregated and/or compressed (because of steady data changes)

> The minimum time granule for reports is 1 week. Typicall time granule is 3 months.
> There are users who need 1 month time granules. Many customers are accessing this resouce base
> and each customer has its on requirements in terms of time aggregations, region aggregations, product type
> aggregations.

These two requirements seem to conflict somewhat in my mind. If the minimum time granule is 1 week, why not bulk load daily? What is the requirement for trickle feed if there is not up-to-the-minute reporting? I bring this up for a few reasons. First, compression. I consider the use of table compression a requirement for data warehousing (the other two being partitioning and parallel query). The exception being if the compression ratio is too low to reap gains from it (say below 2:1). Since you are not on 11g, Advanced Compression is not an option. Compression is not about space savings per se, it is about reduction in data transfer. Think: why do people compress/zip files for email or downloads? It is not to save space. It reduces the bandwidth requirement for the file transfer, which is frequently the problem resource in data warehouses.

> The idea is to have many partitions, so that more granular partition pruning can occur.
> Each partition should be read via FTS using multiblock reads (MBRC=16)

I'm all for good partitioning and full partition scans using parallel query, but you must also consider if you have inserts/updates going on that a query will force all the dirty blocks to be flushed to disk so that PQ can read them back via a direct (physical) read. This may be problematic depending on how frequently queries are being executed and brings me back to the comments about the requirement for trickle feed.  Also I would leave your MBRC unset and let it default. This will give you the optimal I/O size (1MB) for FTS.

> My cusormer plans to use RANGE->List partionining scheme.
> Range on time (period_id), 280 Partitions (slowly changing over time )
> list on product type (domain_product_id), 300 distinct values (fix).
>
> This we should get 280x300 = 84000 partitions. I *suppose* we should get reasonable parse times.

This sounds reasonable, but again, I don't think your worry should be parse times. Especially since I have seen zero evidence that this is actually an issue.

When it comes to query acceleration for data warehousing it basically comes down to a few things. First, for keyed access (single record look-ups) indexes work great. For analytic queries (big aggregations) its all about partitioning and choosing your partition and subpartition keys wisely thus to maximize data elimination. After that, toss in some parallel query (PQ) to scan those partitions in parallel. This will take hardware resources, mainly CPU and I/O bandwidth. If you need more acceleration yet, then work smarter: look into building aggregations using materialized views and query rewrite or buy more hardware.

You can use brute force (parallel query and full partition scans) or brains (mat views and query rewrite), or a mix of both. All depends on the requirements.

-- 
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Thu Apr 02 2009 - 17:28:34 CDT

Original text of this message