Re: Normalization by Composing, not just Decomposing

From: Alan <alan_at_erols.com>
Date: Tue, 13 Apr 2004 09:49:55 -0400
Message-ID: <c5gr5c$1dbtj$1_at_ID-114862.news.uni-berlin.de>


In line, again --->>>

"Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message news:c5f1cm$qo5$1_at_news.netins.net...
> "Alan" <alan_at_erols.com> wrote in message
> news:c5euf6$v5g3$1_at_ID-114862.news.uni-berlin.de...
> > See in line... --->
> >
> >
> > "Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
> > news:c5esfv$hi$1_at_news.netins.net...
> > > "Alan" <alan_at_erols.com> wrote in message
> > > news:c5erh9$qt9b$1_at_ID-114862.news.uni-berlin.de...
> > > > Normalization rules are Codd's rules, not God's rules. They are a
> > _guide_
> > > to
> > > > distributing data among entities, not a dogmatic recipe. You seem to
> > want
> > > to
> > > > project a certain amount of dogmatism on everything, as if life were
> > black
> > > > and white. It isn't, it's an infinite number of shades of gray (well
> > > 16,384
> > > > at least).
> > >
> > > Who, me? Perhaps it comes across that way bz I am new to doing any
sort
> > of
> > > significant study of database theory and compared to everything I have
> > > worked on before, relational theory IS very tightly pre/described with
a
> > > mathematical model. The model is tight but that doesn't mean it is
> > equally
> >
> > ---> In the strictest sense, yes, as a theory, but in practice, after
the
> > rules are followed, you go back and bend them quite often. I think of it
> as
> > a starting point, not an ending point- perhaps that's the explanation
> you're
> > looking for.
>
> I don't think that is the general opinion of relationa theoriests, but
even
> given your approach, I'm contesting the "starting point". I don't mind
> having some logically-based rules from which to deviate, but I'd rather
> start with some closer to what I want to end up with and 1NF is my biggest
> issue of the normalization "rules" from relational theory.
>
> > > tightly implemented. I'll agree completely with the infinite number
of
> > > shades of gray -- I'm guessing even uncountably infinite.
> > >
> > > > Did you hear about the programmer they found dead in the shower? He
> was
> > > > stiff, grasping a bottle of shampoo, his eyes apparently fixed on
the
> > > > instructions, "Lather, rinse repeat."
> > > >
> > > > Here's the general rule of thumb: Normalize to 3NF, and then see if
> that
> > > > works for you in your situation. If it doesn't, then denormalize or
> > > > normalize further. Iterate.
> > >
> > > But I don't want to put the data in 1NF -- there is no reason to do so
> > from
> > > my perspective. Since all other normal forms require the data to be
in
> > 1NF
> > > first, that pretty much kills the process as it is written. However,
> > since
> > > I can put data into (2NF - 1NF) and (3NF - 1NF) that is what I do and
> then
> > > proceed as you describe to refactor the model until it fits.
> > >
> >
> > ---> Ah, the rules DEscribe the data as being in 1NF first, they do not
> > PREscribe the data to be in 1NF first. You find the data however it is,
> and
> > then move it around as necessary. The data may happen to be in 1NF when
> you
> > start, but maybe it isn't. Often, especially after some experience, you
> > start out by placing the data you are familar with in 3NF, and then see
> > where the rest goes. For examlpe. An HR database- well you know you have
> EMP
> > and DEPT, and you have a pretty good idea what's going to go in them. No
> > sense starting in 1NF when you already know where you're going. OTOH, if
> you
> > are woking in a domain about which you know very little, you have no
> choice
> > but to start by asking a lot of questions and placing the data in 1NF,
and
> > take it from there. Note that this is coming from an implementers POV,
not
> a
> > theorist. The theorists are probably ready to explode by now.
> >

>

> Data is rarely in 1NF when you start looking at a problem domain. And I
> have "no choice" but to put the data in 1NF under certain circumstances?
I
> beg to differ -- I can name that tune with 2 & 3NF sans 1NF, but I'm not
> saddled with an RDBMS as the target environment. And, by the way, the
> theorists on the list have been, for the most part, generous in handling
the
> questions of this "implementor" with relatively minor explosions ;-)

--->>> I wrote that when you are unfamiliar with the problem domain, then you will need to put the data in 1NF first. You simply cannot jump right to 3NF. If you are familiar with the problem domain, then, yes, it is possible to go right to 3NF to the degree based on your experience. The more familiar you are, the more correct your 3NF will be.

>
> > > > You ask, "Are you adhering to relational theory when deploying an
OLAP
> > > > database where
> > > > the data is in fact & dimension tables?" Did you read my message?
What
> > did
> > > I
> > > > write about normalization and the star schema?
> > >
> > > I'll re-read and see if it clearer, but it seems to me that you were
> > making
> > > a pitch for OLAP data being "relational" in some way. Sorry if I
> > > misunderstood.
> >
> > ---> No, I was making a pitch that it is not necessary, and can even be
> > confusing, to view OLAP data as being relational in some way. Read Ralph
> > Kimball (the "father" of data warehouses). He refers to what we are
> talking
> > about as "dimensional modeling", and as something that is _applied to_
> > realtional databases. Note that he is _not_ talking about realtional
> > modeling.
>
> Yes, and, in fact, Kimball's "dimensional modeling" is what I was
referring
> to when talking about modeling cubes. Modeling stars or snowflakes for
fact
> & dimension tables IS cube-modeling.

----->>> If you have ever created a cube, you would know that creating a star schema is not modeling a cube. A star schema is a logical representation of the arrangement of the data in a certain way in entities. A cube is but one (albeit most? common) possible physical implementation of the star schema. We may actualy agree in general- we could be getting hung up on sematics.

>

> >
> > >
> > > > Data is not _modeled_ as "OLAP cubes". Cubes are an implementation,
> > > > modelling is analysis and maybe design.
> > >
> > > Cubes could be an implementation, but they could also be used to model
> the
> > > data. There is not always a need to take the data, put it into a
> > relational
> > > data model and reform it for OLAP -- one could go from requirements to
> > OLAP
> > > data model, right? --dawn
> >
> >
> > ---> Yes and no. How would you model an eight (or more) dimension cube?
>
> With a fact table with an 8-part key pointing to eight dimension tables
(or
> something like that).

----->>> No, that is a star schema, not a cube. If you've ever created a pivot table (minus the data to make it a model only) in Excel, you have an idea of what a cube model begins to look like. Cognos Powerplay Transformer is a cube modeler and constructor. Google it- I haven't but there is probably a white paper with examples.

>

> >I
> > don't think it is practical, as models are used for analysis and design.
> You
> > do build a model of the cube (using various cube-creation software)
before
> > you actually create the cube, but you really can't easily model from
> scratch
> > (perform analysis) with it. You can readily model a star schema,
however,
> > and you then build the cube model from a star schema. While you can take
> > "loose" data and turn it into a star schema, the practical reality is
that
> > you are almost always starting out with an existing relational database,
> so
> > it is wise to discuss it from that point of view.
> >
> Seems like a terminology issue, so I'll use yours -- in that case I was
> referring to dimensional modeling of data, which is not the same as
> relational modeling and is also, typically, not used for the same
purposes.
> Oh, wouldn't it be oh, so, grand if we could model the data and use that
> same model for both OLTP & OLAP? Although the modeling for PICK is
neither
> relational modeling (bz not in 1NF) nor dimensional modeling, PICK
software
> developers seem to get away without doing data marts or warehouses and
using
> the operational data, including stored historical data, for most of their
> reporting needs. This is, in part, due to the fact that many PICK shops
are
> in the small-to-midsize business category rather than in large businesses.
> But there might be something more to it that there isn't as much need to
> take the data and reformat it for reporting/analysis purposes. Hmmmm.
I'll
> have to think more about that. --dawn
>

> <snip>
>
>
>
Received on Tue Apr 13 2004 - 15:49:55 CEST

Original text of this message