Re: Surrogate Keys: an Implementation Issue

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Thu, 27 Jul 2006 13:20:10 GMT
Message-ID: <e43yg.27359$pu3.361813_at_ursa-nb00s0.nbnet.nb.ca>


JOG wrote:

> J M Davitt wrote:
>

>>[big snip]
>>
>>
>>>What does that have to do with the
>>>fact that Bob just updated the wrong row?
>>
>>The wrong row?  What was wrong about it?  The database was
>>told, for example, "Make Suzie's record look like this."
>>
>>The confusion arising because "Suzie is George and William
>>is Suzie but Bob thought George was still Suzie" isn't
>>going away just because you slap another value on all your
>>records; you're just moving the confusion around.

>
> There is miscommunication in this thread. Brian is coming from an
> assumption that primary keys are mutable, and everyone else from the RM
> specification that they should be immutable.

I disagree. While stability is an important design criterion for candidate keys, it is important due to the difficulty arising when trying to update a referenced key. This has nothing to do with Selzer's idiocy.

In the scenario painted by Selzer, Bob makes an assertion about something logically identified by some explicit value Bob provides. For the remainder, let's call that value ID0. Bob does this, presumably, because he knows what ID0 means and because he knows the assertion is true.

Assuming, as Selzer does, that assertions about whatever is identified by ID0 already exist in the database, there are two primary ways that Bob could make the new assertion. He could enter a new value for some attribute to replace some existing value, or he could change some value relative to its current value and/or to other values recorded in the database. If Bob makes multiple assertions regarding different attributes, we can look at each in isolation with the two options above. We can ignore the situation where Bob expresses his assertion relative to what is in the database because the actual value recorded will reflect any intervening updates. We only need to consider situations where Bob replaces some value.

First, let's consider the case where the truth of the assertion does not depend in any way on the current state of the database. In this case, Bob makes the assertion because he knows it is true regardless. Even if his update overwrites another person's update, the truth of the assertion does not depend on what was overwritten.

Next, let's consider the case where the truth of the assertion depends only on the current state of the database. In this case, the data is derivable and the dbms should derive the value without any data entry. Thus, having Bob enter anything was an error in the first place.

Finally, let's consider the case where the truth of the assertion depends both on the current state of the database and on some external information Bob has in his possession. In this case, the truth of the assertion is partially derivable. In which case, the dbms should at least be able to recognize underivable assertions.

In the final case above, if something changes that does not affect the truth of the assertion, it is an unimportant change that we can ignore. If something changes that does affect the truth of the assertion, the integrity function should detect that the assertion was not derivable given the new information.

The source of Selzer's problem is his own laziness and stupidity. Instead of treating the database as a set of assertions for a logic system, he treats it as a bunch of records. Selzer then writes shitty sloppy code that updates all of the fields of any record even though Bob made no assertions about most of them. Thus, Selzer's shitty sloppy code introduces a problem where intervening updates revert and get lost.

For example, suppose ID0 identifies an employee named Jane, and that Jane just got back from her honeymoon. Jane stops by Bob's desk down in personnel because there is a typo in her new surname, and Bob dutifully pulls up her information on his computer. Instead of entering the change right away, he gets distracted discussing Jane's honeymoon and reminiscing about his own trip to Maui several years earlier.

While this is all going on, Sue, who administered the benefits plan, gets the paperwork to change Jane's insurance coverage from the single plan to the family plan so that her new husband is covered. She dutifully updates the information in the database.

When Bob finally fixes the typo in Jane's new surname and saves his change, Selzer's shitty sloppy code doesn't just assert the correct surname but dozens of other facts including that Jane has registered for the single coverage in the health insurance plan.

  The primary key identifies
> a tuple, and if something's identity changes... well, conceptually, it
> is a completely different thing. Hence in the example you supply Brian,
> the RM is responding perfectly and as we would want it to.
>
> It is the database designer and his choice of primary key that is
> causing the problems. A person's name is an absolutely dire choice as a
> key given they are _not unique_ and _subject to change_. That's exactly
> why you guys have SSN's and I have an UK National Insurance ID. But
> then to me an SSN isn't a surrogate anyhow, it's a unique attribute.

All keys are surrogates. Natural keys are simply familiar surrogates. Received on Thu Jul 27 2006 - 15:20:10 CEST

Original text of this message