Re: dealing with skewed data

From: Daniel Morgan <damorgan_at_x.washington.edu>
Date: Thu, 15 Jan 2004 16:00:05 -0800
Message-ID: <1074211131.729932@yasure>

Richard Kuhler wrote:

> I'm looking for suggestions and experience on dealing with skewed data
> in a datawarehouse. Unfortunately, I don't see how histograms can help
> with what I believe is a fairly standard star schema. Imagine a
> dimension table D that has two rows with natural keys 'D1' and 'D2'.
> Imagine a fact table F that has 10 million rows. The typical query
> looks something like this:
>
> select f.*
> from d, f
> where d.key = 'D1'
> and f.d_id = d.d_id
>
> Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
> When querying with 'D1' we want to do a full table scan, with 'D2' an
> index scan is faster.
>
> What can be done to get the optimizer on track? Obviously, we could
> eliminate the surrogate keys and then build a histogram on the fact
> table. Is that how this is typically handled? We're also considering
> bitmap join indexes but in reality these queries typically select many
> columns from the dimension tables as well.
>
> Ideas?
>
>
> Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.2.0 - Production
>
> --
> Richard Kuhler

How about dumping the inner join and using a SUBSELECT with IN or EXISTS for one of them? Even a set operator like intersect or minus. Anything to force a different look at the query.

-- 
Daniel Morgan
http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp
http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp
damorgan_at_x.washington.edu
(replace 'x' with a 'u' to reply)

Received on Thu Jan 15 2004 - 18:00:05 CST