Re: Surrogate Keys: an Implementation Issue

From: Jay Dee <ais01479_at_aeneas.net>
Date: Tue, 25 Jul 2006 00:41:05 GMT
Message-ID: <BMdxg.35792$vl5.5254_at_tornado.ohiordc.rr.com>


JOG wrote:
> Brian Selzer wrote:
>

>>"Bernard Peek" <bap_at_shrdlu.com> wrote in message
>>news:xn0ep1e3sn11oo000_at_news.individual.net...
>>
>>>In comp.databases.theory Paul Mansour wrote:
>>>
>>>
>>>
>>>>The second is perhaps more profound. Consider a rollback database, or
>>>>a database that must provide a complete audit trail of every change.
>>>>For example, the database must provide the answer to "who changed
>>>>this SS number from  X to Y, and when did they change it? As far as I
>>>>can tell, if there is no way to answer this without an immutable
>>>>identifier. ( I suppose you could design the DB to handle specific
>>>>cases, but I'm interested in DBMS with native rollback and audit
>>>>trail support.)
>>>
>>>If what you think of as the key is mutable it's not a natural key.
>>>Natural keys aren't mutable, at all, ever.
>>
>>Untrue!  Natural keys are often mutable--especially compound keys.

>
>
> Well moreover, natural keys should not be mutable - no key should be in
> a perfect world. It is only a concsession to real world practicalities
> that they are mutable, and that we have to employ cascades to maintain
> integrity.
>
>
>>>The problem here is that there isn't a usable natural key for
>>>identifying people. So all that's left are surrogates of varying
>>>quality. The SSN is a surrogate that works most of the time. If you
>>>assign someone a payroll number and tell tham that thay won't get paid
>>>unless they can quote it then you have a close to immutable key. But
>>>it's still a surrogate.
>>>
>>>I think the original post was part correct. Use a natural key when you
>>>have one. I'd just add to that, don't use a surrogate unless there is a
>>>compelling reason for it. Efficiency is unlikely to be a compelling
>>>argument in most cases.
>>>
>>>
>>
>>There is always a compelling reason to use surrogates: natural keys can
>>change.  This makes it difficult--if not impossible--to detect changes to
>>rows.  For example, Bob is preparing to update a row--that is, he has read
>>the row and is in the process of keying in a change.  During that time,
>>another process updated several rows in the same table.  Unless the key is
>>immutable, when Bob issues the update, there's no guarantee that the row
>>he's updating is the same one that was read out.  This problem is magnified
>>if there are rows related via a foreign key constraint because it's possible
>>for the referenced row to appear unchanged.  So you're left with either
>>maintaining an exclusive lock on the row until Bob returns from the golf
>>outing, or adding additional columns and code in order to determine with
>>certainty whether or not a change occurred between the time that a row was
>>read and the time of the update.

>
>
> Agreed. For myself I view surrogates as being useful in two situations,
> stemming from quite different points - first when naturals have a high
> potential for change, use of surrogates may provide easier maintainment
> of integrity, given they are, at least initially, under the control of
> the DBMS.Second, if a relation relies on a highly complex compound key
> use of an indexed surrogate might be acceptable in situation where
> query processing speeds are critical. The latter however is an issue of
> consequences at the physical level impinging on the logical level and
> so clearly an uncomfortable concession.
>
> Outside these two situations I advantages in surrogates. And how long
> a DBMS can maintain strict control of an internally generated surrogate
> key before it leaks out into the real world is debatable.

I've heard that surrogate keys are necessary when values used as keys change. That's never made sense to me. If the entity known as 'Jones' is, henceforth, to be known as 'Smith,' what reason do we have assigning the key 'A5600032' to that entity? Now, if one wants to know, "What was Smith's name before?" we've got a different question on our hands -- and not one that a surrogate key will answer. I mean, you still have to know 'Jones' or 'Smith,' right? And if users are simultaneously posting entries which contradict each other -- well, the database really has no business other than doing as it's told, right? How can a design be expected to determine whether the more correct data have been recorded?

As for query response: Well, perhaps that *was* true, but I am more and more convinced that hacks at the logical design made to accommodate poor-performing products are terrible mistakes.

>>Natural keys are necessary to maintain the integrity of the information
>>stored in the database; surrogates are necessary to maintain the integrity
>>of information during the time that it is in use by applications.

This is a curious distinction...

>>
>>>-- 
>>>-- 
>>>Bernard Peek
>>>bap_at_shrdlu.com
>>>

>
>
Received on Tue Jul 25 2006 - 02:41:05 CEST

Original text of this message