Re: How to normalize this?

From: Erwin <e.smout_at_myonline.be>
Date: Wed, 1 May 2013 06:08:38 -0700 (PDT)
Message-ID: <442430a4-20dc-4149-868b-e9cf8e830cac_at_googlegroups.com>


Op woensdag 1 mei 2013 11:20:47 UTC+2 schreef Jan Hidders het volgende:
> On 2013-04-30 22:17:44 +0000, Erwin said:
>
>
>
> > It's often bothered me that normalization theory/procedure seems to
>
> > quietly ignore the notion of "nullability or not" of any of the
>
> > attributes in the "initial table design" ...
>
>
>
> Nullability doesn't really matter for normalization. From a redundancy
>
> point of view null is just another value.

Well, fair enough I suppose, but not without noting that in this particular context, null is then supposed to be a value that is at least equal to itself :-)

>
>
>
> > Iow, that normalization theory per se doesn't actually allow to
>
> > determine when and when not the phenomenon is "an artifact of the
>
> > iniital table design".
>
> >
>
> > Yet iow, that normalization theory doesn't seem to bother whether or
>
> > not the "initial table design" is in fact an accurate relational
>
> > representation of the business problem ...
>
>
>
> Keeping the schema equivalent in information content wasn't a goal
>
> anyway,

I agree that it wasn't.

> and it shouldn't be,

But I think I disagree quite vehemently that it shouldn't ... in our age.

> otherwise we could not remove certain
>
> update anomalies.

"Update anomalies" are a laudable concept to consider, and eliminating them a laudable goal to try and achieve.

Normalization theory does quite a good job at that, as far as it goes, and that's good.

But I think (as of this point I'm nowhere near sure of what I'm saying so pls take with considerable slack and tons of salt) the concept of "update anomalies" are only a proper subset of the "difficulties for the database user to update the database correctly". That's because :

(a) all FDs correspond to constraints in the ultimate logical design,
(b) not all constraints correspond to FDs
(c) and the "difficulties to update the database correctly" derive from _all_ the constraints, not just from those that correspond to FDs.



Therefore, "minimizing the number of potential update anomalies" cannot and does not necessarily imply "minimizing the number of potential constraint violations a user can get when updating the database".

And the latter is exactly what I want, and what I think is needed, in database design. Because achieving _that_ implies the minimal number of considerations to make about constraints for the programmer writing the update statements. (and I think it is indeed _that_ which should be "minimized".)

To exemplify : if I were to run the OP's example on SIRA_PRISE, which has both multiple assignment and arbitrary complexity database constraints, I would _never_ opt for the three-table design that comes out of your dependency preserving normalization (and I would frown quite heavily at someone who would) !!!

I might _even_ leave the 5-column thing simply as is, define two projection views {b,c,e} and {c,d,e} on it and declare the appropriate keys {b,c} {c,d} on those projection views !!!!!!!

> And I would argue that in practice the different
>
> components usually turn out to be independent facts anyway. The fact
>
> that they have different underlying dependencies already sort of hints
>
> at that. But if you have a realistic example that would show otherwise,
>
> that would be interesting.

Not sure if I'm thinking of the same kind of thing as you, but I'm thinking of CUSTOMERs that have an AGE, where AGE determines [some kind of] COMMERCIAL _CLASS.

AGE and COMMERCIAL_CLASS don't become facts that are "independent" from there being a CUSTOMER to begin with. Received on Wed May 01 2013 - 15:08:38 CEST

Original text of this message