Re: Dimensional modelling

From: Ben Ryan <benryan_at_my-deja.com>
Date: Sun, 31 Oct 1999 21:12:40 GMT
Message-ID: <7vibc7$p3n$1@nnrp1.deja.com>

In article <7vgeun$imi$1_at_nnrp1.deja.com>, Vikas Agnihotri <vikasa_at_writeme.com> wrote:
> First of all: is there a more appropriate forum for DW modelling
> questions? c.d.olap doesnt quite seem to cut it.
>
> Couple of questions about dimensional modelling:
>
> 1. How does one set up a star schema at the transaction-level where
> there could be multiple "parties" in a transaction?
>
> In a relational model, you simply have a "master" table and one row
> per party in a detail table.
>
> If the number of parties is fixed, (similar to the "location"
dimension,
> where you could have multiple locations in a star), you could simple
use
> the same "party" dimension repeatedly with views or synonyms or
> somesuch.
>
> But in a star schema, you cant have a "variable" number of dimensions.
> Whats the answer?
>
> 2.
>
> Say I have a workflow based operational system. Workunits enter the
> system, are operated upon, enter various statuses depending on various
> factors, move on to a different work-queue, and finally exit the
system.
>
> What if I want to DM this system at the lowest grain i.e. indivual
> transactions?
>
> How do I model the various work-flow steps, status changes, time
between
> status changes, etc, etc? What kind of dimensions would I need? Would
> the transaction detail table in the OLTP system map to a "dimension"
> table?
>
> Vikas

You cannot create Dimensional model if you do not have some idea of what the output (various aggregate reports) will be reporting on. Specifically you need to be able to work out which bits of data will become "facts" and which bits of data will be used to categorize the data (dimensions). Imagine a typically management report that you see in a spreadsheet. It has row headers and column headers. Those are the dimensions required. The data shown for each column and row combination in the spreadsheet goes in the fact table (the central table of your star query.) Starting with the existing OLTP system's table schema is not the place to start.

If you had a requirement to report on, say, the average time taken to do a work flow STEP, broken down by party and the status reached at the end of the step. Then your "imaginary spreadsheet" could have the status as column headers and party as row headers.

To model this your fact table would consist of a row per work flow step with a database column containing the time taken. There would be two dimensions, party and status, so your fact table would need two foreign keys.

Your OLTP detail table might contain the start time and end time, however you only need to store the difference between the two times in the fact table. You might have to allow for the requirement that time taken should be how long the party was able to work on the step, not the elapsed time, (i.e. ignore dead time, like weekends). You would do the time adjustment when you load the fact table from your OLTP detail table, rather than trying to work it out on "the fly" when a report was run.

The end result is that your star schema may have little in common with your OLTP schema in terms of structure. Hence my earlier recommendation that you do not start your design process with the OLTP schema.

So the answer to your first question cannot be answered until you know what sort of reports your system will be expected to produce and specifically how they will be expected to show a variable number of parties per work unit.

Having said all that; the answer to your last part of your second question is it would probably be your fact table.

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Sun Oct 31 1999 - 15:12:40 CST