Re: Fails Relational, Fails Third Normal Form

From: Nicola <nvitacolonna_at_gmail.com>
Date: Tue, 17 Feb 2015 11:37:57 +0100
Message-ID: <nvitacolonna-7502F4.11375717022015_at_freenews.netfront.net>


In article <20150215200802.3c8c8bae.jklowden_at_speakeasy.net>,  "James K. Lowden" <jklowden_at_speakeasy.net> wrote:

> Yes, "turning some requirements into FDs is difficult" and, no, I'm not
> suggesting it's not worth bothering.

You might replace "FDs" with any other formalism, and it will still be true. Turning requirements into anything formal is always going to be difficult. That's where intuition, experience, and communication skills play a role, and where mathematics has little or nothing to say. If you are designing a database for an application domain you are not familiar with, just understanding the vocabulary and the relevant concepts may be challenging. Some time ago, it took us several meetings to nail down what a call center company meant by "services", "activities", "contacts", "sessions", "events", and so on and so forth, and to tie these things together into a coherent design. The toughest questions were not of the kind: "What are the keys of this thing?", but of the more primitive kind: "What is this thing?" :)

When you have a thorough understanding of the domain, determining which requirements can be translated into a given formalism, and perform such translation, is comparatively easy (if you know your formalism, I mean). Here, a good balance between expressiveness and simplicity is key (pun intended).

> Part of Derek's argument is that an E-R diagram is the best way to
> capture the requirements. He has a tool that lets him nominate primary
> keys and designate FK relationships (and enforce some other domain
> constraints). He doesn't see any point to FD formalism because that
> process lets him create solid designs without it.

E-R diagrams (we may include IDEF1X, although IDEF1X is a bit different from typical E-R diagram) have a better balance (in the sense above) than FDs. They are a reasonable approach to design, possibly the best we currently have.

> The problem with his process IMO is that the proof of the pudding is
> only in the eating. Because there is no formal FD description, there
> is no verification that the design reflects them.

His claim is that the only dependencies are those from the keys, and those dependencies are in the diagram. So, they are reflected by design and there's nothing to verify.

My point of view is that, given the current state of the theory, using a formal approach (e.g., using FDs a *starting* point) is not really a superior way to tackle the design problem-and not for the lack of simplicity, but for the lack of expressiveness. It makes sense, however, to apply dependency theory after we have a design, as an enrichment to the design (and as a verification that we haven't missed some important constraint).

> > Computationally, database design problems are (highly) intractable in
> > general.
>
> Even if the problem is NP hard in general, I suspect there are ways to
> prune the problem space such that more than "tens" of attributes is
> feasible. (Unfortunately, not many people interested in that kind of
> work are addressing themselves to this particular problem.) If human
> beings are able to design normalized databases with hundreds of
> attributes (1000 is a lot, 10,000 I've never heard of), they must be
> doing something the machine could also do. To me that's prima facia
> evidence the algorithms we're using are naïve.

Well, typically humans do some kind of "pre-normalization" by recognizing, say, that information about a student and about a course must be in different schemas... This reduces the complexity a lot. If you don't give that information to the machine, then it will have a tough time. Once, just for fun, I fed my script with most of the attributes and the related FDs from a popular web application, to see whether it would output a better schema. No way (it didn't terminate) :)

And yes, I agree that there hasn't been so much algorithmic research, probably because there isn't so much motivation (you may already build solid designs with existing tools, so what?). But when I see certain database schemas in the real world, I can't stop myself from thinking: "If you had let the computer do that, it would have turned out much better!" (Ehm, if it only stopped) :)

Btw, one field where algorithms might turn useful is in the so-called Object-Relational mapping tools. Currently, those tools result in horrible database designs, because they typically build a one-one mapping from classes to schemas and from instance variables to attributes. How about a procedure that infers FDs from a (possibly annotated) object graph and derives a good, normalized, database schema to back that structure?

> I believe a language could be devised that requires less input
> from the user than do current systems, and yet is complete in the sense
> that it defines sufficient input from which to generate a design. I
> have not seen that language yet. The tersest "language" yet devised
> for the purpose is graphical. Again, though, hardly a hotbed of
> research.

Agree.

Nicola

Received on Tue Feb 17 2015 - 11:37:57 CET

Original text of this message