Re: Fails Relational, Fails Third Normal Form

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Fri, 6 Feb 2015 21:04:21 -0800 (PST)
Message-ID: <ff09d587-e43c-488b-ac2d-506d4e739ade_at_googlegroups.com>


Nicola

> On Thursday, 5 February 2015 23:27:07 UTC+11, Nicola wrote:

First, thank you for joining us.

Second, please forgive the delay in responding, I was busy.

I don't know you yet, so I need to apologise in advance if you find my interjections untoward. I am marking critical points to indicate agreement thus far, such that we establish an agreed set of facts.

> Hi Derek,
> I'd just like to make a remark on your statement in the document you
> have linked:
>
> "[The schema] breaks Third Normal Form, because in each case, there is
> no Key for the data to be Functionally Dependent upon (it is dependent,
> yes, but on a pork sausage, outside the data, not on a Key)."
>
> I hope we all agree that the existence of a key does not depend on a
> developer's choice, but it intrinsically depends on the meaning of the
> data. In the Relational Model - the one you like, and I do, too -

The one that this data model has to confirm to, yes, that the developer declares as conforming.

> each
> and every relational schema always has at least one key, and possibly
> more than one. I hope we all agree on that, too.

Yes.

> To find the keys, you
> must determine which functional dependencies hold.

That is a very slow method, but yes. The tables are dead simple, and the attributes should be well-known to anyone who has any experience at all.

> That, again, depends
> exclusively on the semantics associated to the attributes.
>
> Now, let's take a look, for instance, at the Country schema. I think we
> all share more or less the same idea about what a country code is
> (you've pointed to the standard in a previous post), or what the name of
> a country is. Once we understand the meaning of those attribute names,

Yes, the meaning of data is important.

> it is not difficult (in this case) to make the functional dependencies
> explicit and to determine, for example, that CountryCode is a key (not
> necessarily the only key in this example, but it does not matter for my
> argument).

Yes.

> Once you have found all the keys, you may tell whether your
> schema is in 3NF or not (in this case, it is - provided that I interpret
> correctly, and I suppose I do, the meaning of the remaining attributes).

No, I have stated that it FAILS 3NF.

(There is an interaction going on with Jan, where we have teased out that you people have a fragment of 3NF which you fraudulently call "3NF". Result being, as evidenced here, you accept a total failure, a non-relational Record Filing System, to be satisfactory.)

I think the pivotal difference is, Codd and I require a Key, before Functional Dependency can be worked with, and you people have non-FDs, and on non-keys. While that latter could possibly be relevant when dealing with a relational algebra problem when one is waiting for an act of God while sitting on the toilet, it is irrelevant when evaluating a data model that is intended for implementation.

> Now, the fact that you add a surrogate CountryId does not change
> anything at all: unless we have different concepts of what a (surrogate)
> key is, we should all conclude that CountryId functionally depends on
> CountryCode, and vice versa. Even after adding CountryId to the schema,
> CountryCode remains a key, because that is what our understanding of the
> data dictates. I hope that this is crystal clear.

It is mud.

Ok, the communication is crystal clear.

But the statements are totally false, incorrect, wrong. I can't read the rest of your post. Let us stop and deal with the numerous errors in your para.

Are you sure you know the Relational Model, by Codd, 1970, that you say you like ? (To be clear, I am excluding RM/T where he explored Date's surrogates, and later banned them. I am taking only the RM, which is widely known.) You are not using some bag of fragments that some freaked-out theoretician alleges is the "RM" ?

If you don't mind, if, and when, you achieve a position in which you reject non-Relational piles of crap that fail Relational and 3NF, rather than pass as "5NF, Relational", you may then be in a position to employ the teaching tone. Until then, your "shoulds" and "inappropriates" are more than a little presumptuous, and worse. My uncontaminated thinking results in rejecting his data model using six words; you use six hundred words to justify an acceptance of it. The corollary: if, and when I wish to learn how to use six hundred words to accept totally unacceptable models, I will seek your advice.

It is difficult to take each of your sentences and deal with them, it will consume a great amount of time, akin to explaining the crime to an axe murderer who insists they are following the Ten Commandments. So I won't take that route. I will identify the commandments you have broken, the logic we use in the physical universe to lock up axe murderers, and you can respond.

I trust you are aware that there are two models, Address A, and Address B. Both have been accepted by all of you, using various means to undermine and set aside the RM, and 3NF, and both have been rejected by me. And that the charge you make, re my statement "there are no keys on the data" applies to Address A.

  1. What a Key is, is defined in the RM (that you allege you like, that this data model must comply with). >>" >> Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each element (n-tuple) of that relation. Such a domain (or combination) is called a primary key. >>"

(Codd uses the word Domain in several contexts, the context here is an attribute or column)

(Codd states rules for Alternate Keys, but I don't think we need to visit that point. But this is a good juncture to note the stupidity of the term "candidate keys", which does not exist in the RM, and is yet another trick that is used by you good people to circumvent the strictures of the RM, and to flagellate in the albumin without penetrating the cell wall of the ovum. The forty six years of impotence. Ie. we have the demanded PKs and AKs, and you have the non-relational CKs.)

We contract that, or paraphrase that, to "A key must be made up from the data", which is commonly used by Relational adherents, without losing its original meaning.

A surrogate is not made from the data. Typically, IDENTITY, AUTOINCREMENT, GID, etc, is completely unrelated to the data.

Therefore a surrogate is not a key.

2. The term "surrogate key" is totally false.

A Key has specific properties that a surrogate does not have.

Therefore, to use the term "surrogate key" is a misrepresentation, a fraud, because the user or developer will naturally expect some or all of the basic properties of a Key, and the surrogate has none. Zero.

3. The RM demands >>>row<<< uniqueness.

This is simple to implement (we are talking the logical level, since the data model is declared to be the logical model), by making the Key unique. At the physical level, an unique index is used.

A surrogate does not provide row uniqueness.

Using Address A, because it has surrogates only, no Keys:

3.a Placing the surrogate above the lines does not make it a Primary Key (it fails as per above). One who does so only fools himself.

3.b Using the SQL <PRIMARY KEY> clause does not make a surrogate a Primary Key (it fails as per above). One who does so only fools himself.

Likewise for placing an unique index on the surrogate.

3.c Taking the Country File (implemented in SQL for convenience, and having none of the properties of a Relational table), and let's say CountryId is an IDENTITY column, populated by the server, thus the first column here is CountryCode:

____INSERT Country VALUES ( "MM", Mickey Mouse", ... )
____INSERT Country VALUES ( "MM", Mickey Mouse", ... )
____INSERT Country VALUES ( "MM", Mickey Mouse", ... )

will succeed.

Therefore the surrogate does not provide row uniqueness.

Any and all row uniqueness that is implemented, is implemented via a Key, and only via a Key (of which there may be more than one, in which case, one is the PK, and the others are AKs).

In data model Address A, there are no keys on the data.

4. In data model Address B, he has corrected that one item, the above INSERTS will now fail. (But the file, the data model remains rejected for other reasons not related to your para.)

5. There is one valid use for surrogates, but that scenario is not present here, and thus we don't need to discuss it. I am making the statement, in order to identify that I am not black-or-white about surrogates, that they have an use. And that I have decades of experiences with such properly-used surrogates. As well over 120 assignemnts eliminating improperly-used surrogates.


To now address your para:
> Now, the fact that you add a surrogate CountryId does not change
> anything at all: unless we have different concepts of what a (surrogate)
> key is,

It changes many things, each too substantial to ignore.

> we should all conclude that CountryId functionally depends on
> CountryCode, and vice versa.

I reject the notion entirely.

You are free to use that fragmented understanding of FDs, and note that by that use, you accept models that are non-relational Record Filing systems, that have none of the Relational Integrity, Power, or Speed of the RM, that fail 3NF (the real one not the fragmented bit), as "satisfying 5NF".

You have taken a surrogate (I have proved it is not a Key), and you are equating it to a Key. By using some silliness, that they are "functionally dependent" (again by some strange fragment of "definition", not the real FD definition) on a non-key (hence: donkey, monkey).

> Even after adding CountryId to the schema,
> CountryCode remains a key, because that is what our understanding of the
> data dictates.

There are no keys in data model Address A. No Key on CountryCode.

That has been corrected in the data model [B]. He has one Key on Country, CountryCode.

So my statement that you reference:
> > "[The [Address A] schema] breaks Third Normal Form, because in each case, there is
> > no Key for the data to be Functionally Dependent upon (it is dependent,
> yes, but on a pork sausage, outside the data, not on a Key)."

stands.

It is actually [3], it lies in a continuous body of text that refers to its specific terms and reasoning (prior to that which you quote), including [1] and [2], I won't repeat. That prior text [1] refers to Codd's definition of Key, so there is no reason you should be thinking about a non-key in my [3].

If you have some definition of "key" other than Codd's then you are not Relational. As declared, the data model given has to be Relational. It fails Relational [1][2]. It also fails 3NF [3].

> > This is a classic Record Filing System, anti-relational. It is nowhere near ready for Normalisation, let alone Relational Normalisation.

I determine that in six mins, six words. You (actually all of you) accept his model and argue with my six words, using six hundred words to present your "logic", but you miss the fact that it fails, so the six hundred words are totally irrelevant. The axe murderer using six hundred words to argue that he is keeping the Commandments, while denying the body with the axe still in it. And now you are arguing about the relevance of the six hundred words, without realising (denying ?) that the case has been heard, the axe murderer has been convicted, he is in jail.


A few more comments, without addressing each sentence, because I have dismissed the whole, as detailed above.

> So, your statement above does not make any sense, if only because all
> schemas have at least one key, and all the attributes of a schema
> functionally depend on each key, by definition (Codd's definition).

You are not using Codd's definition, you are using a false report of it from a porcine source.

I have used Codd's definition, above, and eliminated your points.

The evidence is, you don't know the RM, despite your references to it, strange since you say you like it.

> I think I understand, however, what your point is, and I totally embrace
> it (as I bet others would do): when your client (unnecessarily) defines
> a schema with a surrogate key, he "forgets" about the other keys and
> data integrity is thrown out of the window.

That is the result of you having 42 fragments, with no understanding or integration. If you used the two definitions (RM and 3NF, which contains FD), all of which are integrated, and easy to understand, if you did not have ID columns posed as "keys", and the flagellating that goes with it, you would likely not "forget" such things. And you would create valid models in a small fraction of the time.

> Note, however, that a
> statement like "there are no keys on the data" is still inappropriate.

As evidenced above, there are no keys on the data. As evidenced, again, in your sentence above, you are ignorant of the RM.

> What you should rather complain about is that that logical schema design
> fails to capture the real constraints on the data.

Well, if you take it that my reasons (given , now detailed further) that I rejected the data model, are serious, and follow them to completion, the result would be, precisely, that the real relationships between the data will be exposed, and the constraints that are required therefore will also be exposed. In a direct manner. Instead of the backwards (starting from the back, trying to get to the front, as well as primitive, compared to Codd), long and winding journey that you people are taking.

> Technically, what
> your client has designed is a schema with only one functional dependency
> of all the attributes from the surrogate key. Given that, in a formal
> sense the schema is in 5NF - wrt *that* set of dependencies (which is
> *not* the set of "real" dependencies, that is the constraints that you
> have in mind when you think about countries, country names, etc...).
>
> I also agree that, by designing that way, the database becomes akin to a
> "record filing system". I wouldn't go as far as saying that it is "not
> relational",

I have given, both in the post that you quote, and above, the specific reasons why it is not Relational.

It is not Relational.

You do not know what Relational is.

> but certainly it is using the relational model in a bad way.

There is no "good" or "bad" way to use the RM. It is a set of rules, laws. There is only compliance.

There is a scientific way that we use to determine the precise extent of compliance, and thus how much a given model is (a) a pre-1970 Record Filing System at one end of the spectrum, and how much it (b) uses all Relational concepts, correctly, at the other end. But that is not relevant here. Here it fails Relational on basic issues, basic violations, full stop.

> So, Jan is correct

So far, in this thread, he is dead wrong.

I take it you mean the flagellating he is performing is familiar to you, and "correct" in terms of the performance.

> to point out that the normal form has nothing to do
> with how good or bad a logical design is.

Well, in the real world, which is integrated, it is one (not the only) measure, of exactly that, and we have just three (we have five but the last two are beyond your comprehension, out-of-scope for this thread). In the fractured universe, where everything is expressed in terms of isolated fragments (schizophrenia), sure, nothing has anything to do with anything else. And you have 42 NFs, which are tiny fragments of ours.

> You can get a highly
> normalized schema from a completely wrong set of constraints. You should
> complain about the latter in the first place.

I am complaining about both. But since it breaks simple essential rules, it is stupid for me to enumerate how the constraints on broken tables (which they reference) breaks further rules. It is more efficient to get the modeller to fix the essential breaches first, the tables and Keys, and then to deal with the constraints, etc, on tables that can be relied upon.

That emphasises, again that there is no point in evaluating your "fds", because Functionally Dependencies have to have a Key to be Functionally Dependent upon.

But you are free to approach the exercise backwards. And to write pages re "fds" that shoulda coulda woulda bin.

> Finally, talking about constraints and the meaning of the data, I have a
> (genuine) question: does it really happen that two addresses differing
> only in their Unit have different post codes (in my country, that is
> never the case, I think)? Aren't different units assigned when the same
> building has more than one entrance?

It is different in every country. The post code has different meaning in every country, and different granulation. In some countries it applies to an entire block (many apartment buildings); in others to a single entrance to a building, and there, it is a huge maintenance problem. That finer granularity is stupid, because they are trying to identify a specific geographic point, beyond the domain of the postal service, and we already have very good ways to do that (ie. GPS co-ordinates are not post codes). They should stick to their domain, of sorting mail for the letter carriers.

Cheers
Derek Received on Sat Feb 07 2015 - 06:04:21 CET

Original text of this message