Re: Star vs Snowflake: Looking for good reading material

From: David Cressey <david_at_dcressey.com>
Date: Mon, 13 Nov 2000 21:01:00 GMT
Message-ID: <gEYP5.15014$q4.253093_at_petpeeve.ziplink.net>


If you are ONLY going to access data with a single interface, one that isn't SQL,
then I can't speak to your circumstance. Given how long a warehouse might last,
I would never be confident of such an assumption for myself. I expect to load
warehouse data straight into OLAP tools that speak SQL, and only SQL, as well as OO tools that might use the OO interface.

There are, to my way of thinking, some wonderful examples in the Lifecycle book.

The real question is, "what do you gain by snowflaking?" Space? Time? Simplicity?
The "acid" test?

Kimball addresses space and time resource issues better than I can. The "acid"
test should not apply here, unless you are loading your warehouse in or near real time.

So I'm left with simplicity. Simplicity is somewhat subjective, but here's my take:
I learned star schema design as an alternative to normalized design, but I didn't
really internalize it until I got exposed to something else. That something else is
dimensional modeling.

If you are starting with an E-R model or an object model, and you build a star schema,
it's always going to give you the feeling that you are performing an unnatural act. But
if you start with a dimensional model as the analysis model, a star schema begins to look
very simple, and very natural. I'm far enough down that road now so that I can look back
to designs I did fifteen years ago, and say , "if I had known then what I know now,
I never would have normalized it so much."

My recommendation is that you read, very carefully, the chapters about "An introductory course n dimensional modeling" and "a graduate course  in dimensional modeling". (sorry if I got the chapter headings wrong. I don't have the book in front of me). And, as you read, keep in mind that modeling is not the same as design.

Having said that, there are plenty of circumstances, other than warehouses and data marts,where normalization is still the best design principle, and star schema would be at best a distraction, and at worst a serious obstacle to quality.

Richard MacDonald wrote in message
<9sCP5.8567$mq1.585423_at_bgtnsc04-news.ops.worldnet.att.net>...
>Hi all. I'm currently studying "The data warehouse lifecycle toolkit" by
>Kimball et al.
>They are against snowflakes in general, and they have a number of good
>reasons.
>However, with an OO and 3NF background, I'm having trouble accepting this
>:-)
>I'm also still in a learning stage and may not have all the concepts down.
>
>I'd like some more reading material, perhaps with some examples. Any
>suggestions
>and links appreciated.
>
>An example I can think of is the location portion of an address. We have
>city, county,
>state, zip and so forth. All these are related and constrained. What is my
>grain? The
>smallest chunk, e.g., zip? Can I allow additional tables to support this
>dimension
>table?
>
>What if I make my dimension table a view based on the "root" tables? I know
>I might be trashing performance due to loss of bitmap indexing, but is
 there
>anything wrong with this in principle?
>
>Kimball et al. state that one of the big problems with snowflakes is
>complicating the user interface. I'm actually working via an OO-RDBMS
>mapping tool, so I can handle this. I think. Once again, any experience
>appreciated.
>
>Thanks in advance.
>
>
Received on Mon Nov 13 2000 - 22:01:00 CET

Original text of this message