Re: Little design mistakes that can be easily avoided (2): Listenning to CELKO (and CELKO alikes)

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 28 May 2007 20:31:08 GMT
Message-ID: <g_G6i.12426$RX.451_at_newssvr11.news.prodigy.net>


"Matthias Klaey" <mpky_at_hotmail.com> wrote in message news:rd6l53tm7fp7d1aju5v335gcgaebji8r0e_at_4ax.com...
> "Brian Selzer" <brian_at_selzer-software.com> wrote:
>
>>
>>"Matthias Klaey" <mpky_at_hotmail.com> wrote in message
>>news:5f2k53pq5srphk68s1op58udei9h60stuo_at_4ax.com...
>>> "Brian Selzer" <brian_at_selzer-software.com> wrote:
>>>
>>>>
>>>>"Matthias Klaey" <mpky_at_hotmail.com> wrote in message
>>>>news:bipi53pjf51gluul6b3b8uv86kbq7hanse_at_4ax.com...
>>>>> Cimode <cimode_at_hotmail.com> wrote:
>>>>>
>>>>>>
>>>>>>Matthias Klaey a écrit :
>>>>>>> Cimode <cimode_at_hotmail.com> wrote:
>>>>>>>
>>>>>>> >On 26 mai, 18:18, Matthias Klaey <m..._at_hotmail.com> wrote:
>>>>>>> >> Cimode <cim..._at_hotmail.com> wrote:
>>>>>>> >> >Hi,
>>>>>
>>>>> [...]
>>>>>
> [...]
>
>>>>>
>>>>
>>>>If you've studied it, they you know that Celko's failed to mention the
>>>>last
>>>>paragraph of section 4, which is in direct contradiction to his rant.
>>>>I've
>>>>pointed this and other things out to him before, but he still dumps the
>>>>same
>>>>flawed argument on people all the time. In other words, he *knowingly*
>>>>mischaracterizes Codd's position to advance his own agenda. (He wants
>>>>people to buy his books!)
>>>
> [...]
>
>>> Or this one? Paper 2, Section 4 last paragraph
>>>
>>> "They will have to remember, however, that it is now the surrogate
>>> that is the primary key and provides truly permanent identification
>>> of each entity. The capability of making equi-joins on surrogates
>>> implies that users see the headings of such columns but not the
>>> specific values in those columns."
>>>
>>> And how, specifically, does any of these paragraphs contradict Celkos
>>> "rant" (sic!)? Does it contradict the whole of it, or just part of it,
>>> and if the latter is the case, which part?
>>>
>>
>>This is the one. It directly contradicts the following paragraph in
>>Celko's
>>["rant" is a bit more polite than "spew" It certainly doesn't rise to the
>>level of an argument. The fallacies and deliberate attempt to mislead
>>place
>>it more in the category of propaganda than anything else.]:
>>
>><<
>>This means that a surrogate ought to act like an index; created by the
>>user, managed by the system and NEVER seen by a user. That means
>>never used in queries, DRI or anything else that a user does.
>
>>
>>Which sums up his entire argument against surrogates. He then uses the
>>term
>>"authority" as if in the article Codd was arguing against surrogates, when
>>the section he quoted (or deliberately misquoted) is a clear argument in
>>favor of surrogates! Now, while I'm not sure I agree with Codd when it
>>comes to hiding surrogate key values, I certainly wouldn't twist his words
>>completely around in order to support my own agenda!
>
> I don't think that Celko argues *against* surrogates, in sharp
> contrast to "exposed physical locators". He clearly makes the
> distinction between the two types.
>

If I understand him correctly, Celko states that the values for a surrogate key are a function of the other values in a row. Clearly, this does not fulfill the requirement that the values provide permanent identification for an object, since that value would necessarily change along with the rest of the row.

> The only issue I see is about the visibility of surrogate key
> *headings* in equi-joins. The values have no meaning whatsoever to the
> user, as s/he cannot use them in Inserts, Updates, or Deletes.
>

A surrogate key value does indeed have a meaning. It is a symbol that represents in the abstract the essence of an object in the universe of discourse, that is, that which makes one object different from all other objects that have ever existed or will ever exist. The E-domian from RM/T encapsulates this metaphysical notion in one place by enumerating symbolically the essence of each and every object that can ever be discussed (be referenced in a database extension). This means that there is a 1:1 mapping *independent of time* between the objects under discussion and the values in the E-domain. As a result, a value drawn from the E-domain identifies an object not only throughout a single extension of the database (which is all that a candidate key can guarantee), but throughout all possible extensions. Contrast this with the set of values for a composite natural key which may only distinguish an object from all others during a particular interval in time: if you update a row in a table where the entire heading is the key, are you selecting a different object in the universe of discourse, or is the new set of values the current state of affairs for the same object? If there is a surrogate key, then there is no question: if the surrogate key value is different, then a different object has been selected, otherwise, the new set of values represents the current state of affairs for the same object.

Now, I agree that an end-user may not be able to make sense of this--from the content of this and other discussions on this and other newsgroups, it is clear that even so-called experts aren't able to grasp the concept, even though Codd wrote about it back in 1979. In addition, since surrogate key values are symbols, how such symbols are represented is completely arbitrary. All that is required is that the representative values be distinct. Presenting such abstract values to people who don't have a clue what they represent is dangerous. So if "the user" means "the end user," then I agree with Codd that the values should be hidden. On the other hand, if "the user" means "the application developer" or "the database designer," then I don't agree. A permanent identifier for each object makes optimistic concurrency control possible, ensures consistency when using disconnected recordsets and datasets, and simplifies the representation of temporal data and the definition of temporal constraints, including transition constraints.

A simple solution is to provide the end-users with a set of updatable views that do not include surrogate keys in their headings, and through security, prevent ad-hoc access to the base tables. It's the best of both worlds, since the end-users never see the surrogate key values, whereas the applications and constraints that expect a key value to identify the same object at different points in time do.

> I do not understand why it should be neccessary to display the
> headings of the surrogate keys in an equi-join. If I know that
> relations R and S have surrogate (primary) keys, I should be able to
> write a view
>
> equi-join(R, S)
>
> without even having to know the names of the primary keys.
>
> Greetings
> Matthias Kläy
> --
> www.kcc.ch
Received on Mon May 28 2007 - 22:31:08 CEST

Original text of this message