Re: Surrogate Keys: an Implementation Issue

From: Brian Selzer <brian_at_selzer-software.com>
Date: Fri, 28 Jul 2006 01:36:48 GMT
Message-ID: <QSdyg.73474$fb2.6921_at_newssvr27.news.prodigy.net>


"Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message news:e43yg.27359$pu3.361813_at_ursa-nb00s0.nbnet.nb.ca...
> JOG wrote:
>
>> J M Davitt wrote:
>>
>>>[big snip]
>>>
>>>
>>>>What does that have to do with the
>>>>fact that Bob just updated the wrong row?
>>>
>>>The wrong row? What was wrong about it? The database was
>>>told, for example, "Make Suzie's record look like this."
>>>
>>>The confusion arising because "Suzie is George and William
>>>is Suzie but Bob thought George was still Suzie" isn't
>>>going away just because you slap another value on all your
>>>records; you're just moving the confusion around.
>>
>> There is miscommunication in this thread. Brian is coming from an
>> assumption that primary keys are mutable, and everyone else from the RM
>> specification that they should be immutable.
>
> I disagree. While stability is an important design criterion for candidate
> keys, it is important due to the difficulty arising when trying to update
> a referenced key. This has nothing to do with Selzer's idiocy.
>

How much work have you done in the field? This comment makes you sound like a neophyte.

> In the scenario painted by Selzer, Bob makes an assertion about something
> logically identified by some explicit value Bob provides. For the
> remainder, let's call that value ID0. Bob does this, presumably, because
> he knows what ID0 means and because he knows the assertion is true.
>

But Bob isn't the only user. What Bob thinks he knows to be true may no longer be true.

> Assuming, as Selzer does, that assertions about whatever is identified by
> ID0 already exist in the database, there are two primary ways that Bob
> could make the new assertion. He could enter a new value for some
> attribute to replace some existing value, or he could change some value
> relative to its current value and/or to other values recorded in the
> database. If Bob makes multiple assertions regarding different attributes,
> we can look at each in isolation with the two options above. We can ignore
> the situation where Bob expresses his assertion relative to what is in the
> database because the actual value recorded will reflect any intervening
> updates. We only need to consider situations where Bob replaces some
> value.
>

How can you possibly argue that you can look at each attribute in isolation? I think you need to go back to school to learn about the various types of dependencies.

> First, let's consider the case where the truth of the assertion does not
> depend in any way on the current state of the database. In this case, Bob
> makes the assertion because he knows it is true regardless. Even if his
> update overwrites another person's update, the truth of the assertion does
> not depend on what was overwritten.
>
> Next, let's consider the case where the truth of the assertion depends
> only on the current state of the database. In this case, the data is
> derivable and the dbms should derive the value without any data entry.
> Thus, having Bob enter anything was an error in the first place.
>
> Finally, let's consider the case where the truth of the assertion depends
> both on the current state of the database and on some external information
> Bob has in his possession. In this case, the truth of the assertion is
> partially derivable. In which case, the dbms should at least be able to
> recognize underivable assertions.
>
> In the final case above, if something changes that does not affect the
> truth of the assertion, it is an unimportant change that we can ignore. If
> something changes that does affect the truth of the assertion, the
> integrity function should detect that the assertion was not derivable
> given the new information.
>

Incoherent gibberish. By the way, a dbms can only guarantee consistency, not correctness.

> The source of Selzer's problem is his own laziness and stupidity. Instead
> of treating the database as a set of assertions for a logic system, he
> treats it as a bunch of records. Selzer then writes shitty sloppy code
> that updates all of the fields of any record even though Bob made no
> assertions about most of them. Thus, Selzer's shitty sloppy code
> introduces a problem where intervening updates revert and get lost.
>
> For example, suppose ID0 identifies an employee named Jane, and that Jane
> just got back from her honeymoon. Jane stops by Bob's desk down in
> personnel because there is a typo in her new surname, and Bob dutifully
> pulls up her information on his computer. Instead of entering the change
> right away, he gets distracted discussing Jane's honeymoon and reminiscing
> about his own trip to Maui several years earlier.
>
> While this is all going on, Sue, who administered the benefits plan, gets
> the paperwork to change Jane's insurance coverage from the single plan to
> the family plan so that her new husband is covered. She dutifully updates
> the information in the database.
>
> When Bob finally fixes the typo in Jane's new surname and saves his
> change, Selzer's shitty sloppy code doesn't just assert the correct
> surname but dozens of other facts including that Jane has registered for
> the single coverage in the health insurance plan.
>

This was the most asinine tyrade I've heard in a long while. I'm still laughing.

The relational model has only one operation, assignment, which involves replacing the value of one or more relations in their entirety, so it doesn't matter whether only one attribute on one tuple is modified, the entire relation is replaced.

You appear to be operating under the assumption that only what Bob asserts is important. That is seldom the case, especially when the information Bob's using to make the assertion is stale.

>
> The primary key identifies
>> a tuple, and if something's identity changes... well, conceptually, it
>> is a completely different thing. Hence in the example you supply Brian,
>> the RM is responding perfectly and as we would want it to.
>>
>> It is the database designer and his choice of primary key that is
>> causing the problems. A person's name is an absolutely dire choice as a
>> key given they are _not unique_ and _subject to change_. That's exactly
>> why you guys have SSN's and I have an UK National Insurance ID. But
>> then to me an SSN isn't a surrogate anyhow, it's a unique attribute.
>
> All keys are surrogates. Natural keys are simply familiar surrogates.
Received on Fri Jul 28 2006 - 03:36:48 CEST

Original text of this message