Normalization by Composing, not just Decomposing

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Thu, 8 Apr 2004 13:51:04 -0500
Message-ID: <c546v4$a76$1_at_news.netins.net>



Sorry I have so many questions, but I do appreciate the help I have received from this list. I just read, or rather, skimmed the document Jan pointed me to related to XML and normal forms. There were other more accessible papers there that I skimmed too.

If I am understanding correctly, the process of normalization for any set of data attributes is a process of decomposing from one large set to several smaller ones. That makes sense when starting from scratch.

But tests for determining whether data is normalized also seem to focus on whether it has been fragmented sufficiently and do not take into account whether the data has been TOO fragmented.

For example, if we have attributes: ID, First Name, Last Name, Nick Name where the ID is a primary key (or candidate key if you prefer) and for each ID there is precisely one list of Nick Names and the Nick Name list (relation, if you prefer) is determined by the ID, the whole ID, and nothing but the ID, then in the relational model, most folks would still split out Nick Names into a separate relation simply because it is, itself, a relation.

More progressive relational modelers might decide it is OK to model the relation-valued attribute of Nick Names within the first relation. But either option would then be acceptable and considered normalized (using newer definitions of 1NF).

But there seem to be no "rules" or even guidelines that are provided to COMPOSE or keep together the Nick Names with the ID. Such rules would be the ones I would add to what I have seen related to XML modeling and are used, without being explicitly stated, by PICK developers. The imprecise description of this rule is:

If it is dependent on the key, the whole key, and nothing but the key, then don't split it out!

More precision, but not absolute precision, would give us something like: Let A be the set of all Attributes and FD be the set of all functional dependencies among the attributes. If a is an element of A and is a key and mv is another element (named to give a hint that it might be multivalued, aka relation-valued) and a-->mv is in FD (but no subcomponent of a implies mv), then

mv should be an attribute in a relation where a is a key and for all attributes b with this same relationship to a, mv should be in the relation with b

In other words, there ought to be some "rules" that govern when we ought not split out data attributes, in general, as well as when we should decompose them.

Or am I missing something? Perhaps what I skimmed includes this, but I just didn't pick it up. I know I haven't read everything out there -- are there other places where normalization or rules related to data modeling are not focussed exclusively on when to split attributes out, but also include bringing them together when they have already been unnecessarily decomposed?

Thanks. --dawn Received on Thu Apr 08 2004 - 20:51:04 CEST

Original text of this message