Re: ADR's Normalization question

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Tue, 17 Nov 2009 09:40:35 -0800 (PST)
Message-ID: <3b386236-bb1e-41df-8783-0afbeb802ddb_at_f16g2000yqm.googlegroups.com>


> I suggest you read Ronald Fagin's paper "Multivalued Dependencies and a New
> Normal Form for Relational Databases." Date's "otherwise quite
> independent" criterion exemplifies the kind of relationship defined in the
> paper.

I must have been braindead at the moment: I know and love my MVD's, but despite your having mentioned them by name, I just kept on thinking about how Date goes around it in the quote.

I actually consider MVD's more fundamental to normalization than FD's, because they give a condition for two-way nonloss decomposition that is not only sufficient but also necessary. Of course join dependencies would be better still, but in practice they're nowhere near as easy to spot or wrap your head around than FD's (for each x we have precisely one y) or MVD's (for each x we have a unique set Y). Just about the only nasty thing about full MVD's is that you always have to consider the context in which the attributes appear, so that the concept really only works well when you're designing the schema from the top down, and not from the bottom up. In the other direction you'd probably want to use embedded MVD's, but then IIRC their implication problem is open.

As a funky note about the higher normal forms and more exotic kinds of dependencies... Just about all course notes and practical expositions of normalization I've seen sooner or later tell you 3NF (or perhaps BCNF) is all you need in practice; "most relations in the wild are then in 4NF as well". For the load of crap that is, it's repeated amazingly often. You don't need to have anything in your model beyond firms with sets of subsidiaries, people with families or orders with line items, and you're already well into (E)MVD land. As such it's not a big surprise that one well-known study found about one in five deployed databases to have MVD's which hadn't been factored out (and that was before star schemas became all the craze).

But certainly you never have to consider irreducible join dependencies, right? Wrong again: the very first schema I got to design from the ground up at work in fact had a genuine, irreducible, three-pronged join dependency that had to be explicitly broken down. In that case we had projects with milestones and measurements taken at those milestones. Each project defined which measurements it was using, and the milestones were subsetted as well, from a formal waterfall template shared across the organization. So far we could of course make do with MVD's, but additionally each metric was only defined for a subset of the milestones; you couldn't for example have cost measurements for preproject planning checkpoints where the project hadn't been given a go, because the appropriate cost accounts hadn't in that case been assigned yet. Bang: suddenly you no longer have a two way nonloss decomposition, although a three way one exists. And of course there was lots of assorted wackiness going on besides that, like measurements being optional within the above constraints (the eventual solution had to resort to triggers to check the inclusion dependency against {project, milestone, measure} since Oracle doesn't let you point a foreign key to a view), temporal normalization (measurements could change values after first being entered, and so had to be versioned), partial temporal constraints on the order the milestone data was filled out, measurement targets with a separate life of their own except that they couldn't be set after actuals started to pour in, and so on, and so forth.

I ended up constraining the hell out of the thing even after the basic design was in picture perfect PJ/NF (perhaps even 6NF, I didn't formally check that one), with the result that there hasn't been a single integrity issue with said DB to date. Had I just normalized away the final nulls (in validity end time columns) and done away with surrogates, that baby would probably qualify as my masterpiece.

--
Sampo
Received on Tue Nov 17 2009 - 18:40:35 CET

Original text of this message