Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Re: dealing with skewed data
It's unfortunate that the CBO has a number of variations on the basic calculations that kick in when the numbers of rows are very small.
I think you are seeing the result of two symptoms. First that the join selectivity algorithm is doing something non-standard when the driving table is known to return just one row. Second that Oracle uses (or seems to use) number of distinct values for join estimates rather than the density - not that it would help you to use the density, because you still only get one access path to cover two very different queries.
By the way, you don't get a star_transformation unless you have at least two dimension tables involved, so if your query is a very common requirement it puts you a little to the edge of the standard 'star schema' processing and you might get some benefit if you can partition your table on the target column so that you get one small and one large partition. Oracle has a run-time join optimisation that would allow it to determine whether it should be visiting the small or large partition - apart from that, I don't know of any mechanism that Oracle has that would allow it to determine that the value in the row selected for the join is the special case value in the fact table distribution.
If your queries are only for
d.key = 'D1'
and not
d.other_column = 'xxx'
then you might consider a
function-based bitmap index
on the fact table (perhaps on decode()
if the dimension table is guaranteed
to be static), rather than a join index.
-- Regards Jonathan Lewis http://www.jlcomp.demon.co.uk The educated person is not the person who can answer the questions, but the person who can question the answers -- T. Schick Jr Next public appearance2: March 2004 Hotsos Symposium - Keynote March 2004 Charlotte NC - OUG Tutorial April 2004 Iceland One-day tutorials: http://www.jlcomp.demon.co.uk/tutorial.html Three-day seminar: see http://www.jlcomp.demon.co.uk/seminar.html ____UK___February The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/faq/ind_faq.html "Richard Kuhler" <noone_at_nowhere.com> wrote in message news:MyFNb.82210$Vs3.69341_at_twister.socal.rr.com...Received on Mon Jan 19 2004 - 10:05:17 CST
> I'm looking for suggestions and experience on dealing with skewed data
> in a datawarehouse. Unfortunately, I don't see how histograms can help
> with what I believe is a fairly standard star schema. Imagine a
> dimension table D that has two rows with natural keys 'D1' and 'D2'.
> Imagine a fact table F that has 10 million rows. The typical query
> looks something like this:
>
> select f.*
> from d, f
> where d.key = 'D1'
> and f.d_id = d.d_id
>
> Now let's say 99% of those rows belong to dimension 'D1' and 1% to 'D2'.
> When querying with 'D1' we want to do a full table scan, with 'D2' an
> index scan is faster.
>
> What can be done to get the optimizer on track? Obviously, we could
> eliminate the surrogate keys and then build a histogram on the fact
> table. Is that how this is typically handled? We're also considering
> bitmap join indexes but in reality these queries typically select many
> columns from the dimension tables as well.
>
> Ideas?
>
>
> Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.2.0 - Production
>
> --
> Richard Kuhler
>
![]() |
![]() |