dealing with skewed data

From: Richard Kuhler <noone_at_nowhere.com>
Date: Thu, 15 Jan 2004 23:41:32 GMT
Message-ID: <MyFNb.82210$Vs3.69341@twister.socal.rr.com>

I'm looking for suggestions and experience on dealing with skewed data in a datawarehouse. Unfortunately, I don't see how histograms can help with what I believe is a fairly standard star schema. Imagine a dimension table D that has two rows with natural keys 'D1' and 'D2'. Imagine a fact table F that has 10 million rows. The typical query looks something like this:

select f.*
from d, f
where d.key = 'D1'

and f.d_id = d.d_id

Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'. When querying with 'D1' we want to do a full table scan, with 'D2' an index scan is faster.

What can be done to get the optimizer on track? Obviously, we could eliminate the surrogate keys and then build a histogram on the fact table. Is that how this is typically handled? We're also considering bitmap join indexes but in reality these queries typically select many columns from the dimension tables as well.

Ideas?

Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.2.0 - Production

--
Richard Kuhler

Received on Thu Jan 15 2004 - 17:41:32 CST