Re: Fails Relational, Fails Third Normal Form

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Sun, 15 Feb 2015 20:08:02 -0500
Message-Id: <20150215200802.3c8c8bae.jklowden_at_speakeasy.net>


On Thu, 12 Feb 2015 10:52:54 +0100
Nicola <nvitacolonna_at_gmail.com> wrote:

> In article <20150212012341.d8329e51.jklowden_at_speakeasy.net>,
> "James K. Lowden" <jklowden_at_speakeasy.net> wrote:
>
> > Quoth Nicola on Thu, 05 Feb 2015:
> > > A few years ago I implemented a few algorithms from Koehler's PhD
> > > thesis in a Ruby script. Given a set of FDs, the script finds all
> > > the keys and all the minimal covers....
> >
> > My first reaction is a little unkind. I think this is what lawyers
> > call "assuming facts not in evidence". *Given* a set of FDs, the
> > program generated a 3NF database design. Hurray! Now, where to
> > find those pesky FDs for input?
>
> From the requirements? I'm not sure I grasp what your point is.
> Do you mean that turning some requirements into FDs is difficult (you
> would not be alone: Stonebraker says that "mere mortals do not
> understand FDS")? Or that FDs (and even MVDs, JDs, etc) comprise only
> a tiny fraction of real system's set of requirements, so it's not
> worth bothering?

Yes, "turning some requirements into FDs is difficult" and, no, I'm not suggesting it's not worth bothering.

Part of Derek's argument is that an E-R diagram is the best way to capture the requirements. He has a tool that lets him nominate primary keys and designate FK relationships (and enforce some other domain constraints). He doesn't see any point to FD formalism because that process lets him create solid designs without it.

The problem with his process IMO is that the proof of the pudding is only in the eating. Because there is no formal FD description, there is no verification that the design reflects them.

> Computationally, database design problems are (highly) intractable in
> general. But in practice many problems can be solved. For example,
> finding the keys of a schema (given the FDs) is an inherently hard
> problem, but for schemas with up to a few tens of attributes or so,
> only in few cases you are not able to find them in a reasonable
> amount of time. On the contrary, finding all the minimal covers is
> much less practical, even when there are not many of them.

Even if the problem is NP hard in general, I suspect there are ways to prune the problem space such that more than "tens" of attributes is feasible. (Unfortunately, not many people interested in that kind of work are addressing themselves to this particular problem.) If human beings are able to design normalized databases with hundreds of attributes (1000 is a lot, 10,000 I've never heard of), they must be doing something the machine could also do. To me that's prima facia evidence the algorithms we're using are naïve.

Granted, that's a hand-wave, and you'd be justified to answer that if it's so easy, maybe I should just show everyone how it's done. I'm not saying it's easy. I'm saying we're making progress on other computationally hard problems by limiting the problem space, and that there's evidence this field could profit from the same work.

> FDs are a special type of first-order formulas. I can imagine
> defining an English-like syntax for them.

I'm sure, but the problem is not as simple as you suggest.

Notational convenience is a big deal, and more art than science. If I may say so, the field of databases hasn't produced much in the way of compilable languages. We got SQL (not from a lab) and stopped.

I had to learn the value of notation the way I learn everything: the hard way. Some years ago I took up an interest in dependency tracking for software packages. To represent packages and their dependencies in a relational database design is no big deal, maybe 10 tables and 50 attributes IIRC. But, how was the user of my database going to present the dependencies to it? With a sequence of INSERT statements? Hardly. Thus did I find myself in the world of language design when I thought the problem was only one of satisfying dependencies.

I believe a language could be devised that requires less input from the user than do current systems, and yet is complete in the sense that it defines sufficient input from which to generate a design. I have not seen that language yet. The tersest "language" yet devised for the purpose is graphical. Again, though, hardly a hotbed of research.

--jkl Received on Mon Feb 16 2015 - 02:08:02 CET

Original text of this message