Re: Surrogate Keys: an Implementation Issue

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Tue, 25 Jul 2006 03:16:07 GMT
Message-ID: <X1gxg.15744$pu3.340040_at_ursa-nb00s0.nbnet.nb.ca>


Brian Selzer wrote:

> "Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message 
> news:XQ5xg.14298$pu3.333248_at_ursa-nb00s0.nbnet.nb.ca...
> 

>>Brian Selzer wrote:
>>
>>>"Bernard Peek" <bap_at_shrdlu.com> wrote in message
>>>news:xn0ep1e3sn11oo000_at_news.individual.net...
>>>
>>>>In comp.databases.theory Paul Mansour wrote:
>>>>
>>>>>The second is perhaps more profound. Consider a rollback database, or
>>>>>a database that must provide a complete audit trail of every change.
>>>>>For example, the database must provide the answer to "who changed
>>>>>this SS number from X to Y, and when did they change it? As far as I
>>>>>can tell, if there is no way to answer this without an immutable
>>>>>identifier. ( I suppose you could design the DB to handle specific
>>>>>cases, but I'm interested in DBMS with native rollback and audit
>>>>>trail support.)
>>>>
>>>>If what you think of as the key is mutable it's not a natural key.
>>>>Natural keys aren't mutable, at all, ever.
>>>
>>>Untrue! Natural keys are often mutable--especially compound keys.
>>>
>>>>The problem here is that there isn't a usable natural key for
>>>>identifying people. So all that's left are surrogates of varying
>>>>quality. The SSN is a surrogate that works most of the time. If you
>>>>assign someone a payroll number and tell tham that thay won't get paid
>>>>unless they can quote it then you have a close to immutable key. But
>>>>it's still a surrogate.
>>>>
>>>>I think the original post was part correct. Use a natural key when you
>>>>have one. I'd just add to that, don't use a surrogate unless there is a
>>>>compelling reason for it. Efficiency is unlikely to be a compelling
>>>>argument in most cases.
>>>
>>>There is always a compelling reason to use surrogates: natural keys can
>>>change. This makes it difficult--if not impossible--to detect changes to
>>>rows. For example, Bob is preparing to update a row--that is, he has
>>>read the row and is in the process of keying in a change. During that
>>>time, another process updated several rows in the same table. Unless the
>>>key is immutable, when Bob issues the update, there's no guarantee that
>>>the row he's updating is the same one that was read out.
>>
>>So? Bob issues an update statement to change some attribute identified by
>>a logical identifier. As long as the data Bob is entering is correct, why
>>should he care what happens to any other attributes?
>
> He may be overwriting a change made by another user.

The only relevant concern is whether the update is correct. If the update is correct, it matters not at all that it overwrites something, which is presumably no longer correct.

>> This problem is magnified
>>
>>>if there are rows related via a foreign key constraint because it's
>>>possible for the referenced row to appear unchanged. So you're left with
>>>either maintaining an exclusive lock on the row until Bob returns from
>>>the golf outing, or adding additional columns and code in order to
>>>determine with certainty whether or not a change occurred between the
>>>time that a row was read and the time of the update.
>>
>>Or you can just write better applications that don't update anything that
>>didn't change. By introducing some 'under-the-covers' identifying
>>attribute, you create a risk that Bob will change some data identified by
>>a familiar logical identifier and some other process in the meantime will
>>associate that identifier with a different surrogate. Your application
>>will then record the updates against the wrong logical identifier.

> 
> You're missing the point.  The change Bob's making may change the row, but 
> because several changes occurred to the table while Bob was keying in his 
> change, the row he's about to change may represent some other entity 
> altogether.

Using logical identity and natural keys, that's not possible. The key that Bob specifies identifies the entity. And if Bob updates the entire row, the entire row needs updating. I draw your attention, in particular, to my observation that one can just write better applications that don't update anything that didn't change.

The only time Bob's change could update the wrong entity is if the application Bob is using identifies the 'row' using a surrogate key that Bob cannot see.

> The Relational Model doesn't take into account duration

You have yet to establish that it needs to.

, so the value of a
> database at the time of a read is not necessarily the same as that at the > time of a write

This only matters if one writes shitty applications with sloppy code. See my earlier posts.

, nor is the meaning of a key with a specific atomic value at > the time of a read necessarily the same as the meaning of a key with the > same atomic value at the time of a write.

Your statement is only possible if the value of the key is hidden from the user. Thus, you invented the problem by introducing a surrogate. Otherwise, values do not change their meanings. They have the same meaning for all time.

   Surrogates provide the continuity
> across database states that the Relational Model lacks.

I disagree. Hidden surrogates re-introduce all the problems of location-based updates. The use of logical identity through exposed values obviates your entire argument. Apparently, you lack the ability to recognize that fact.

>>It seems to me you just went to a lot of trouble to invent a problem that
>>never existed in the first place.
>
> I didn't invent this problem.

I disagree. Had you not introduced a hidden surrogate, the problem would never have existed.

   I've run across it several times. It's > extremely difficult to diagnose and even more difficult to repair, since it > lets garbage get into the database.

Shitty applications with sloppy code force garbage into the database. Stop writing them.

>>>Natural keys are necessary to maintain the integrity of the information
>>>stored in the database; surrogates are necessary to maintain the
>>>integrity of information during the time that it is in use by
>>>applications.
>>
>>I disagree entirely.
Received on Tue Jul 25 2006 - 05:16:07 CEST

Original text of this message