Re: Normalization by Composing, not just Decomposing

From: Dawn M. Wolthuis <>
Date: Mon, 12 Apr 2004 16:23:21 -0500
Message-ID: <c5f1cm$qo5$>

"Alan" <> wrote in message news:c5euf6$v5g3$
> See in line... --->
> "Dawn M. Wolthuis" <> wrote in message
> news:c5esfv$hi$
> > "Alan" <> wrote in message
> > news:c5erh9$qt9b$
> > > Normalization rules are Codd's rules, not God's rules. They are a
> _guide_
> > to
> > > distributing data among entities, not a dogmatic recipe. You seem to
> want
> > to
> > > project a certain amount of dogmatism on everything, as if life were
> black
> > > and white. It isn't, it's an infinite number of shades of gray (well
> > 16,384
> > > at least).
> >
> > Who, me? Perhaps it comes across that way bz I am new to doing any sort
> of
> > significant study of database theory and compared to everything I have
> > worked on before, relational theory IS very tightly pre/described with a
> > mathematical model. The model is tight but that doesn't mean it is
> equally
> ---> In the strictest sense, yes, as a theory, but in practice, after the
> rules are followed, you go back and bend them quite often. I think of it
> a starting point, not an ending point- perhaps that's the explanation
> looking for.

I don't think that is the general opinion of relationa theoriests, but even given your approach, I'm contesting the "starting point". I don't mind having some logically-based rules from which to deviate, but I'd rather start with some closer to what I want to end up with and 1NF is my biggest issue of the normalization "rules" from relational theory.

> > tightly implemented. I'll agree completely with the infinite number of
> > shades of gray -- I'm guessing even uncountably infinite.
> >
> > > Did you hear about the programmer they found dead in the shower? He
> > > stiff, grasping a bottle of shampoo, his eyes apparently fixed on the
> > > instructions, "Lather, rinse repeat."
> > >
> > > Here's the general rule of thumb: Normalize to 3NF, and then see if
> > > works for you in your situation. If it doesn't, then denormalize or
> > > normalize further. Iterate.
> >
> > But I don't want to put the data in 1NF -- there is no reason to do so
> from
> > my perspective. Since all other normal forms require the data to be in
> 1NF
> > first, that pretty much kills the process as it is written. However,
> since
> > I can put data into (2NF - 1NF) and (3NF - 1NF) that is what I do and
> > proceed as you describe to refactor the model until it fits.
> >
> ---> Ah, the rules DEscribe the data as being in 1NF first, they do not
> PREscribe the data to be in 1NF first. You find the data however it is,
> then move it around as necessary. The data may happen to be in 1NF when
> start, but maybe it isn't. Often, especially after some experience, you
> start out by placing the data you are familar with in 3NF, and then see
> where the rest goes. For examlpe. An HR database- well you know you have
> and DEPT, and you have a pretty good idea what's going to go in them. No
> sense starting in 1NF when you already know where you're going. OTOH, if
> are woking in a domain about which you know very little, you have no
> but to start by asking a lot of questions and placing the data in 1NF, and
> take it from there. Note that this is coming from an implementers POV, not
> theorist. The theorists are probably ready to explode by now.

Data is rarely in 1NF when you start looking at a problem domain. And I have "no choice" but to put the data in 1NF under certain circumstances? I beg to differ -- I can name that tune with 2 & 3NF sans 1NF, but I'm not saddled with an RDBMS as the target environment. And, by the way, the theorists on the list have been, for the most part, generous in handling the questions of this "implementor" with relatively minor explosions ;-)

> > > You ask, "Are you adhering to relational theory when deploying an OLAP
> > > database where
> > > the data is in fact & dimension tables?" Did you read my message? What
> did
> > I
> > > write about normalization and the star schema?
> >
> > I'll re-read and see if it clearer, but it seems to me that you were
> making
> > a pitch for OLAP data being "relational" in some way. Sorry if I
> > misunderstood.
> ---> No, I was making a pitch that it is not necessary, and can even be
> confusing, to view OLAP data as being relational in some way. Read Ralph
> Kimball (the "father" of data warehouses). He refers to what we are
> about as "dimensional modeling", and as something that is _applied to_
> realtional databases. Note that he is _not_ talking about realtional
> modeling.

Yes, and, in fact, Kimball's "dimensional modeling" is what I was referring to when talking about modeling cubes. Modeling stars or snowflakes for fact & dimension tables IS cube-modeling.

> >
> > > Data is not _modeled_ as "OLAP cubes". Cubes are an implementation,
> > > modelling is analysis and maybe design.
> >
> > Cubes could be an implementation, but they could also be used to model
> > data. There is not always a need to take the data, put it into a
> relational
> > data model and reform it for OLAP -- one could go from requirements to
> > data model, right? --dawn
> ---> Yes and no. How would you model an eight (or more) dimension cube?

With a fact table with an 8-part key pointing to eight dimension tables (or something like that).

> don't think it is practical, as models are used for analysis and design.
> do build a model of the cube (using various cube-creation software) before
> you actually create the cube, but you really can't easily model from
> (perform analysis) with it. You can readily model a star schema, however,
> and you then build the cube model from a star schema. While you can take
> "loose" data and turn it into a star schema, the practical reality is that
> you are almost always starting out with an existing relational database,
> it is wise to discuss it from that point of view.
Seems like a terminology issue, so I'll use yours -- in that case I was referring to dimensional modeling of data, which is not the same as relational modeling and is also, typically, not used for the same purposes. Oh, wouldn't it be oh, so, grand if we could model the data and use that same model for both OLTP & OLAP? Although the modeling for PICK is neither relational modeling (bz not in 1NF) nor dimensional modeling, PICK software developers seem to get away without doing data marts or warehouses and using the operational data, including stored historical data, for most of their reporting needs. This is, in part, due to the fact that many PICK shops are in the small-to-midsize business category rather than in large businesses. But there might be something more to it that there isn't as much need to take the data and reformat it for reporting/analysis purposes. Hmmmm. I'll have to think more about that. --dawn

<snip> Received on Mon Apr 12 2004 - 23:23:21 CEST

Original text of this message