Re: Data Warehouse Guru Question (Data Model question) between FACT and DIMENSION tbl

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Tue, 3 Aug 2021 17:52:55 +0100
Message-ID: <CAGtsp8nq=Xd_bFyGJLh2cOTViFkvvq2-KfdM-PkZwiQeJ69eXg_at_mail.gmail.com>



>> But then we have an additional FILTER being applied to the DIMENSION
table.
>> Currently this doesn't kill performance, but definitely adds about 25%
execution time (6 secs out of 24) and I'm concerned this will get worse as data volumes grow.
>> So the question is , should CASE statement filter really get applied to
the FACT table and replicate that column (or some version of it) to the FACT table instead? (Generally as a rule of thumb I mean)

The fragment you've shown looks perfectly normal for a bitmap star transformation with joinback to dimension when reporting any dimension columns that aren't key columns.
(I'm assuming the filter predicate ends with " = 'something}' ". That's just the generic way that Oracle deals with the star transformation when a joinback is needed.

Adding "FIELD1" to the index supporting the PK is the sensible option if you want to improve performance - just as carrying columns with any primary key index is for any type of query.

It shouldn't be "unusual" to have non-key predicates against dimension tables, and I'd expect most queries to be doing that type of thing anyway.

The only thing that would make your plan odd (for a bitmap star transformation) would be if that predicate with the case experession hadn't also been used in the first pass against the dimension table that genrated the primary keys used to select the bitmaps from the fact table indexes.

Regards
Jonathan Lewis

On Wed, 28 Jul 2021 at 14:41, Chris Taylor <christopherdtaylor1994_at_gmail.com> wrote:

> This is a sanity check question as I'm helping out with a data model
> design and indexing strategy with our ETL team. None of us are what I'd
> call real strong on data warehouse design.
>
> Here's the setup and the question:
>
> Setup:
> We have a DIMENSION table partitioned by SITE (Client) and we have a FACT
> table partitioned by SITE (Client).
>
> The PK on the DIMENSION is a combo of SITE+KEY_FIELD
> The join between the DIMENSION and the FACT is on SITE+KEY_FIELD
>
> All that is fine and works great with bitmap indexes on the FACT etc.
>
> *HOWEVER* here is the problem:
>
> In one particular query, we're applying an *additional FILTER *condition
> to the DIMENSION table (instead of the FACT) after the join which causes a
> bit of performance penalty as the rows are found between the DIMENSION and
> FACT but then the filter is then applied to the rows from the DIMENSION
> table (which is very large table also).
>
> My theory is that this filter clause (the columns used by the filter)
> should actually also be in the FACT and apply the filter there with a
> BITMAP index instead of applying it to the DIMENSION. RIght now the
> condition (columns) in the DIMENSION that are checked for a value do not
> exist in the FACT table..
>
> The join & condition is something like this:
>
>
>
>
>
>
>
> *| 35 | PARTITION LIST ITERATOR |
> | 0 | 1 | | 1 (0)| 00:00:01
> | KEY | KEY | Q1,01 | PCWP|* 36 | INDEX UNIQUE SCAN
> | DIMENSION_TABLE_NDX2 | 0 | 1 |
> | 1 (0)| 00:00:01 | KEY | KEY | Q1,01 | PCWP|* 37 |
> TABLE ACCESS BY LOCAL INDEX ROWID | DIMENSION_TABLE 36 -
> access("FACT_TABLE"."SITE_ID"="DIMENSION_TABLE"."SITE_ID" AND
> "FACT_TABLE"."KEY_FIELD"="DIMENSION_TABLE"."KEY_FIELD") 37 - filter((CASE
> "FIELD1" WHEN 'I' THEN 'Inpatient' WHEN 'O' THEN 'Outpatient' END))*
>
> So we have the join between the FACT and DIMENSION on SITE_ID & KEY_FIELD
>
> But then we have an additional FILTER being applied to the DIMENSION table.
>
> Currently this doesn't kill performance, but definitely adds about 25%
> execution time (6 secs out of 24) and I'm concerned this will get worse as
> data volumes grow.
>
> So the question is , should CASE statement filter really get applied to
> the FACT table and replicate that column (or some version of it) to the
> FACT table instead? (Generally as a rule of thumb I mean)
>
> We can add the CASE statement to the PK index on the DIMENSION_TABLE as
> well, and thats what the ETL team is suggesting, but that seems like the
> wrong way to go.
>

> Chris
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Aug 03 2021 - 18:52:55 CEST

Original text of this message