Re: sane number of the table partitions in DWH

From: Milen Kulev <makulev_at_gmx.net>
Date: Sun, 05 Apr 2009 20:52:51 +0200
Message-ID: <49D8FE03.6070007_at_gmx.net>



Hello Greg ,
I agree, the "bitmap conversion filter ratio" 10M -> ~ 800 is really bad. My customer decided to implement a new partitioning scheme with that many partitions (~ 80k).
This should solve the inefficient "bitmap conversion filter ratio" problem. Many thanks to all who contributed to this thread.

Best Regards.
Milen

Greg Rahn wrote:
> I would agree with Riyaj's comments. Simply put, 10 million BITMAP
> CONVERSION TO ROWIDS is way too many to be well performing.
> Personally, I would probably draw the line at 250K as a reasonable
> number, before going back to revisit the physical design options.
>
> I would also comment that parse time is probably the least of your
> worries. Personally I have never seen it be a significant issue in a
> warehouse. Do be aware of the segment header caching issue, but given
> a frequently used set of partitions and a large enough buffer cache
> and the bug fix, that should be a non-issue as well.
>
> Lets talk some requirements and design:
>
>
>> Data volatility :
>> This is an operational DWH database. Data is almost steadily loaded (mainly INSERTs + UPDATEs).
>> Bitmap indexes are getting bloated/corrupt and must be often rebuilded.
>> Furthermore, data in the fact tables could not be preaggregated and/or compressed (because of steady data changes)
>>
>
>
>> Furthermore, data in the fact tables could not be preaggregated and/or compressed (because of steady data changes)
>>
>
>
>> The minimum time granule for reports is 1 week. Typicall time granule is 3 months.
>> There are users who need 1 month time granules. Many customers are accessing this resouce base
>> and each customer has its on requirements in terms of time aggregations, region aggregations, product type
>> aggregations.
>>
>
> These two requirements seem to conflict somewhat in my mind. If the
> minimum time granule is 1 week, why not bulk load daily? What is the
> requirement for trickle feed if there is not up-to-the-minute
> reporting? I bring this up for a few reasons. First, compression. I
> consider the use of table compression a requirement for data
> warehousing (the other two being partitioning and parallel query).
> The exception being if the compression ratio is too low to reap gains
> from it (say below 2:1). Since you are not on 11g, Advanced
> Compression is not an option. Compression is not about space savings
> per se, it is about reduction in data transfer. Think: why do people
> compress/zip files for email or downloads? It is not to save space.
> It reduces the bandwidth requirement for the file transfer, which is
> frequently the problem resource in data warehouses.
>
>
>> The idea is to have many partitions, so that more granular partition pruning can occur.
>> Each partition should be read via FTS using multiblock reads (MBRC=16)
>>
>
> I'm all for good partitioning and full partition scans using parallel
> query, but you must also consider if you have inserts/updates going on
> that a query will force all the dirty blocks to be flushed to disk so
> that PQ can read them back via a direct (physical) read. This may be
> problematic depending on how frequently queries are being executed and
> brings me back to the comments about the requirement for trickle feed.
> Also I would leave your MBRC unset and let it default. This will
> give you the optimal I/O size (1MB) for FTS.
>
>
>> My cusormer plans to use RANGE->List partionining scheme.
>> Range on time (period_id), 280 Partitions (slowly changing over time )
>> list on product type (domain_product_id), 300 distinct values (fix).
>>
>> This we should get 280x300 = 84000 partitions. I *suppose* we should get reasonable parse times.
>>
>
> This sounds reasonable, but again, I don't think your worry should be
> parse times. Especially since I have seen zero evidence that this is
> actually an issue.
>
> When it comes to query acceleration for data warehousing it basically
> comes down to a few things. First, for keyed access (single record
> look-ups) indexes work great. For analytic queries (big aggregations)
> its all about partitioning and choosing your partition and
> subpartition keys wisely thus to maximize data elimination. After
> that, toss in some parallel query (PQ) to scan those partitions in
> parallel. This will take hardware resources, mainly CPU and I/O
> bandwidth. If you need more acceleration yet, then work smarter: look
> into building aggregations using materialized views and query rewrite
> or buy more hardware.
>
> You can use brute force (parallel query and full partition scans) or
> brains (mat views and query rewrite), or a mix of both. All depends
> on the requirements.
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Apr 05 2009 - 13:52:51 CDT

Original text of this message