Re: Fails Relational, Fails Third Normal Form

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Sun, 8 Feb 2015 18:59:33 -0500
Message-Id: <20150208185933.59eada02.jklowden_at_speakeasy.net>


On Sat, 7 Feb 2015 17:46:38 -0800 (PST)
Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:

> > > I think the pivotal difference is, Codd and I require a Key,
> > > before Functional Dependency can be worked with, and you people
> > > have non-FDs, and on non-keys.
> >
> > The formal definition of a key depends on the notion of FD: FDs
> > come first, keys can be derived from them. I don't know what you
> > mean by "non-FDs", but FDs on "non-keys" exist, because the reality
> > to be modeled mandates it.
>
> Totally rejected.
...
> 4. I am quite aware of the abstract notion that non-FDs can be used
> to determine the key.... That is relevant only when you have a
> Record Filing system, devoid of Keys with meaning, focusing on
> non-keys that have no meaning, to determine that such non-keys are
> somehow "valid". [...]
>
> All that [4] is totally irrelevant because it is not necessary for
> the requested task, because the Keys have not been correctly
> determined.

I think this exchange illustrates a difference in tradition that you feel is idiotic but is really just a question of what one assumes.

From your point of view, you have a customer and a system that maintains addresses in a particular place and time. The columns have meaning (exemplified in their names) and the keys more or less announce themselves to you. Any ambiguity or error can be addressed by discussing them with your customer. I.e., by agreeing on their meaning. Any formalism with FDs is pointless.

From the academic point of view -- indeed from the point of view of the DBMS, as you know -- no column has meaning. It has a type and domain, and some relationship (perhaps functional dependency) on other columns. Any statement about keys *must* be based on stated FDs.

Your corrective notes end with what really is all that need be said,

> * Find out what the data is, what it means, how it relates to
> all other data in this cluster.

That is an option open to your group and not generally to c.d.t.. Anyone here willing to make an assertion about the correctness of the model must also be willing to make assumptions about the meaning of the columns. However safe those assumptions might be, they are still only assumptions.

Surely you agree that to be unwilling to make assumptions like that need not be an exercise in stupidity or obfuscation.

An example of the difficulty arising from unclear meaning is the discussion over the relationship of postal code to unit. I would never have guessed there are jurisdictions in which they are 1:1, but you said (IIUC) that there are. Other questions arise, too. When I looked at
http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf I found myself wondering about StreetName and StreetType. I couldn't think of an application for which those tables would be useful. They're not objectively wrong, but I assume they are.

I remember a different example in my work. We had two tables, Countries, and CountryGroups. Each Country had and ISO code and was the real deal. CountryGroups reflected various political designations and business imperatives.

One fine day a developer wrestling with an "application problem" (his term) asked for comment on his preferred solution: to add one row to Countries named "all countries". (You can imagine my reaction and I yours!) I would say the suggestion stemmed fundamentally from a failure to understand the meaning of the Countries table. To the man in question, the table had no meaning per se, and the "missing" row was a deficiency. I suggested, colorfully, that the concept of "all countries" belonged squarely in CountryGroups. Obvious as that may be to you, it took quite a lot of persuasion to prevent corrupting a basic domain table. Meaning is surprisingly hard to pin down.

LIke you, I learned about 3NF from an informal description. I don't know how many treatises I've read describing an algorithm based on FDs; they all read to me like the How to Hunt Elephants (e.g., "COMPUTER SCIENTISTS"
http://paws.kettering.edu//~jhuggins/humor/elephants.html): sure to succeed if ever it finished, and unnecessary in my context.

Unlike you, I don't think the FD formalism is an exercise in navel gazing. Would only that the described algorithm were implemented, and we could pour our column definitions in and get a 3NF (say) logical model out!

The problem as I see it isn't in the formalism per se, but in describing the columns' meaning to the algorithm. You do that in your head and depict the result with IDEF1X. I've done the same. Are you prepared to say that's the last and best way? I'm not. I'm still waiting for an FD language (loosely speaking) that will describe my database better than SQL, from which I can generate an IDEF1X diagram and matching SQL DDL. That would be a better way to work, and would be fruit from FD tree.

--jkl Received on Mon Feb 09 2015 - 00:59:33 CET

Original text of this message