Re: Normalization by Composing, not just Decomposing

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Thu, 8 Apr 2004 15:20:22 -0500
Message-ID: <c54c6j$cbq$1_at_news.netins.net>


"Alan" <alan_at_erols.com> wrote in message news:c54a0e$2ohurg$1_at_ID-114862.news.uni-berlin.de...
> You are assuming that (good) normalization is a science. It is not. It is
> part science and part art- that's where experience (as well as ESP to read
> the user's minds and clairvoiance to predict future needs) comes in to
play.
> Oh, it is also part voodoo. Sometimes waving a dead chicken in a paper bag
> over your head produces the results you need.

You are preachin' to the choir-ish -- that's the type of thing I would say if I were not trying, oh so hard, to learn what makes relational theorists tick and trying, oh so hard, to use the same way of thinking so that I can really learn what it is about relational theory that is keeping it the king of the hill. There are very formalized statements, using very mathematical terminology and all, that show the process of normalization to be np-complete or whatever else makes some folks feel all warm and fuzzy (not the mathematical use of the term "fuzzy"). When I ask what it is about relational theory that makes it king, I hear that it is because it is based on mathematics, including predicate logic.

> By the way, the process of
> putting it back together is called denormalization, not composing, and is
> not uncommon, but as you noted, there are no rules. That's why experienced
> data modelers get paid more than newbies.

Yes, you are right and I'm quite familiar with denormalization used with OLAP, which is why I avoided that term. From what I have seen, folks talk about denormalization when going away from transactional data processing and I didn't want to inadvertantly take the thread in that direction. .

So, with your statements about formalization of the rules for good data modeling/design/implementation, are you in "the relational camp" or among the less orthodox (of us)? Thanks. --dawn

>
> "Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
> news:c546v4$a76$1_at_news.netins.net...
> > Sorry I have so many questions, but I do appreciate the help I have
> received
> > from this list. I just read, or rather, skimmed the document Jan
pointed
> me
> > to related to XML and normal forms. There were other more accessible
> papers
> > there that I skimmed too.
> >
> > If I am understanding correctly, the process of normalization for any
set
> of
> > data attributes is a process of decomposing from one large set to
several
> > smaller ones. That makes sense when starting from scratch.
> >
> > But tests for determining whether data is normalized also seem to focus
on
> > whether it has been fragmented sufficiently and do not take into account
> > whether the data has been TOO fragmented.
> >
> > For example, if we have attributes: ID, First Name, Last Name, Nick Name
> > where the ID is a primary key (or candidate key if you prefer) and for
> each
> > ID there is precisely one list of Nick Names and the Nick Name list
> > (relation, if you prefer) is determined by the ID, the whole ID, and
> nothing
> > but the ID, then in the relational model, most folks would still split
out
> > Nick Names into a separate relation simply because it is, itself, a
> > relation.
> >
> > More progressive relational modelers might decide it is OK to model the
> > relation-valued attribute of Nick Names within the first relation. But
> > either option would then be acceptable and considered normalized (using
> > newer definitions of 1NF).
> >
> > But there seem to be no "rules" or even guidelines that are provided to
> > COMPOSE or keep together the Nick Names with the ID. Such rules would
be
> > the ones I would add to what I have seen related to XML modeling and are
> > used, without being explicitly stated, by PICK developers. The
imprecise
> > description of this rule is:
> >
> > If it is dependent on the key, the whole key, and nothing but the key,
> then
> > don't split it out!
> >
> > More precision, but not absolute precision, would give us something
like:
> > Let A be the set of all Attributes and FD be the set of all functional
> > dependencies among the attributes. If a is an element of A and is a key
> and
> > mv is another element (named to give a hint that it might be
multivalued,
> > aka relation-valued) and a-->mv is in FD (but no subcomponent of a
implies
> > mv), then
> >
> > mv should be an attribute in a relation where a is a key and for all
> > attributes b with this same relationship to a, mv should be in the
> relation
> > with b
> >
> > In other words, there ought to be some "rules" that govern when we ought
> not
> > split out data attributes, in general, as well as when we should
decompose
> > them.
> >
> > Or am I missing something? Perhaps what I skimmed includes this, but I
> just
> > didn't pick it up. I know I haven't read everything out there -- are
there
> > other places where normalization or rules related to data modeling are
not
> > focussed exclusively on when to split attributes out, but also include
> > bringing them together when they have already been unnecessarily
> decomposed?
> >
> > Thanks. --dawn
Received on Thu Apr 08 2004 - 22:20:22 CEST

Original text of this message