Re: Surrogate Keys: an Implementation Issue
Date: Mon, 24 Jul 2006 18:38:41 GMT
Message-ID: <Rs8xg.72439$Lm5.35322_at_newssvr12.news.prodigy.com>
"Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message
news:XQ5xg.14298$pu3.333248_at_ursa-nb00s0.nbnet.nb.ca...
> Brian Selzer wrote:
>
>> "Bernard Peek" <bap_at_shrdlu.com> wrote in message
>> news:xn0ep1e3sn11oo000_at_news.individual.net...
>>
>>>In comp.databases.theory Paul Mansour wrote:
>>>
>>>>The second is perhaps more profound. Consider a rollback database, or
>>>>a database that must provide a complete audit trail of every change.
>>>>For example, the database must provide the answer to "who changed
>>>>this SS number from X to Y, and when did they change it? As far as I
>>>>can tell, if there is no way to answer this without an immutable
>>>>identifier. ( I suppose you could design the DB to handle specific
>>>>cases, but I'm interested in DBMS with native rollback and audit
>>>>trail support.)
>>>
>>>If what you think of as the key is mutable it's not a natural key.
>>>Natural keys aren't mutable, at all, ever.
>>
>> Untrue! Natural keys are often mutable--especially compound keys.
>>
>>
>>>The problem here is that there isn't a usable natural key for
>>>identifying people. So all that's left are surrogates of varying
>>>quality. The SSN is a surrogate that works most of the time. If you
>>>assign someone a payroll number and tell tham that thay won't get paid
>>>unless they can quote it then you have a close to immutable key. But
>>>it's still a surrogate.
>>>
>>>I think the original post was part correct. Use a natural key when you
>>>have one. I'd just add to that, don't use a surrogate unless there is a
>>>compelling reason for it. Efficiency is unlikely to be a compelling
>>>argument in most cases.
>>
>> There is always a compelling reason to use surrogates: natural keys can
>> change. This makes it difficult--if not impossible--to detect changes to
>> rows. For example, Bob is preparing to update a row--that is, he has
>> read the row and is in the process of keying in a change. During that
>> time, another process updated several rows in the same table. Unless the
>> key is immutable, when Bob issues the update, there's no guarantee that
>> the row he's updating is the same one that was read out.
>
> So? Bob issues an update statement to change some attribute identified by
> a logical identifier. As long as the data Bob is entering is correct, why
> should he care what happens to any other attributes?
>
He may be overwriting a change made by another user.
>
> This problem is magnified
>> if there are rows related via a foreign key constraint because it's
>> possible for the referenced row to appear unchanged. So you're left with
>> either maintaining an exclusive lock on the row until Bob returns from
>> the golf outing, or adding additional columns and code in order to
>> determine with certainty whether or not a change occurred between the
>> time that a row was read and the time of the update.
>
> Or you can just write better applications that don't update anything that
> didn't change. By introducing some 'under-the-covers' identifying
> attribute, you create a risk that Bob will change some data identified by
> a familiar logical identifier and some other process in the meantime will
> associate that identifier with a different surrogate. Your application
> will then record the updates against the wrong logical identifier.
>
The Relational Model doesn't take into account duration, so the value of a
database at the time of a read is not necessarily the same as that at the
time of a write, nor is the meaning of a key with a specific atomic value at
the time of a read necessarily the same as the meaning of a key with the
same atomic value at the time of a write. Surrogates provide the continuity
across database states that the Relational Model lacks.
> It seems to me you just went to a lot of trouble to invent a problem that
> never existed in the first place.
>
I didn't invent this problem. I've run across it several times. It's
extremely difficult to diagnose and even more difficult to repair, since it
lets garbage get into the database.
>
>> Natural keys are necessary to maintain the integrity of the information
>> stored in the database; surrogates are necessary to maintain the
>> integrity of information during the time that it is in use by
>> applications.
>
> I disagree entirely.
Received on Mon Jul 24 2006 - 20:38:41 CEST