Re: Fails Relational, Fails Third Normal Form

From: Nicola <nvitacolonna_at_gmail.com>
Date: Thu, 05 Feb 2015 13:27:04 +0100
Message-ID: <nvitacolonna-47DD1F.13270305022015_at_freenews.netfront.net>


Hi Derek,
I'd just like to make a remark on your statement in the document you have linked:

"[The schema] breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key)."

I hope we all agree that the existence of a key does not depend on a developer's choice, but it intrinsically depends on the meaning of the data. In the Relational Model - the one you like, and I do, too - each and every relational schema always has at least one key, and possibly more than one. I hope we all agree on that, too. To find the keys, you must determine which functional dependencies hold. That, again, depends exclusively on the semantics associated to the attributes.

Now, let's take a look, for instance, at the Country schema. I think we all share more or less the same idea about what a country code is
(you've pointed to the standard in a previous post), or what the name of
a country is. Once we understand the meaning of those attribute names, it is not difficult (in this case) to make the functional dependencies explicit and to determine, for example, that CountryCode is a key (not necessarily the only key in this example, but it does not matter for my argument). Once you have found all the keys, you may tell whether your schema is in 3NF or not (in this case, it is - provided that I interpret correctly, and I suppose I do, the meaning of the remaining attributes).

Now, the fact that you add a surrogate CountryId does not change anything at all: unless we have different concepts of what a (surrogate) key is, we should all conclude that CountryId functionally depends on CountryCode, and vice versa. Even after adding CountryId to the schema, CountryCode remains a key, because that is what our understanding of the data dictates. I hope that this is crystal clear.

So, your statement above does not make any sense, if only because all schemas have at least one key, and all the attributes of a schema functionally depend on each key, by definition (Codd's definition).

I think I understand, however, what your point is, and I totally embrace it (as I bet others would do): when your client (unnecessarily) defines a schema with a surrogate key, he "forgets" about the other keys and data integrity is thrown out of the window. Note, however, that a statement like "there are no keys on the data" is still inappropriate. What you should rather complain about is that that logical schema design fails to capture the real constraints on the data. Technically, what your client has designed is a schema with only one functional dependency of all the attributes from the surrogate key. Given that, in a formal sense the schema is in 5NF - wrt *that* set of dependencies (which is *not* the set of "real" dependencies, that is the constraints that you have in mind when you think about countries, country names, etc...).

I also agree that, by designing that way, the database becomes akin to a "record filing system". I wouldn't go as far as saying that it is "not relational", but certainly it is using the relational model in a bad way.

So, Jan is correct to point out that the normal form has nothing to do with how good or bad a logical design is. You can get a highly normalized schema from a completely wrong set of constraints. You should complain about the latter in the first place.

Finally, talking about constraints and the meaning of the data, I have a
(genuine) question: does it really happen that two addresses differing
only in their Unit have different post codes (in my country, that is
never the case, I think)? Aren't different units assigned when the same building has more than one entrance?

Nicola

Received on Thu Feb 05 2015 - 13:27:04 CET

Original text of this message