Re: Data Warehouse Newbie Questions

From: David Cressey <david_at_dcressey.com>
Date: Tue, 04 Jun 2002 17:24:40 GMT
Message-ID: <sJ6L8.4881$Bz4.23106_at_petpeeve.ziplink.net>


Sergio,

> I have came to the conclusion that I need a "Data Warehouse" ; I have
> read some papers, and still have a few questions on the subject :
>

You might want to read I few books on the subject. I recommend Ralph Kimball's books
on Data Warehousing.

> 1) Suppose I have a 'Sales' "fact table"... some Sales are "basic
> sales", but some are more complicated, hence I need to store more
> "attributes".
> Do I create a new 'Special Sales' "fact table" ? Or is it better that
> 'Sales' table holds all the possible attributes, and have them "NULL"
> when not relevant ? And what about using the "supertype" pattern ( a
> special sale is described by a record in 'Sales' table and a record in
> 'Special Sales' that provides the additional attributes ) ?
> ([ this question relates to one of my previous posts -- and I thank
> people who took time to answer me --, but I would like people's
> opinion in the context of a data warehouse, that will contain up to
> 200 million records ! ])

I don't have a specific answer to your question, because it depends on how you
use the data. In general, if you end up with too many NULLS, and too much testing for NULLS
in your queries, it means that you have defined your facts wrong.

> 2) In a 'Time' dimension, each day, with its week, month, year, etc
> must be entered in the corresponding table. Who does this process ? Is
> it done automatically by some kind of mechanism in the RDMS or must it
> be done manually ?

Usually, it's up to a programmer to write a program that loads the time dimension.
Depending on the DBMS you are using, this can be pretty easy. If there's a column for
something like "company holiday", you'll need a data source for this item.

> 3) Does data REALLY need to be denormalised (I am a newbie !) ? Is it
> necessary or is it just a hacker optimisation ? I have allways been
> taught to normalise !

Star schemas work better when the dimension tables are partly denormalized, down to 1st or 2nd normal form.
There shouldn't be any difference, since dimension tables should have simple keys.

The only time you will have any difficulties will be when updating dimensions. The update process for a star schema is often a challenge no matter how you design it. For queries, the denormalization will generally help you more than it hurts you.

> 4) Do you know of any good web sites on the subject of designing a
> Data Warehouse ?

www.rkimball.com has some good content, and good links.

Regards,

   David G. Cressey Received on Tue Jun 04 2002 - 19:24:40 CEST

Original text of this message