Re: Surrogate Keys: an Implementation Issue

From: Brian Selzer <brian_at_selzer-software.com>
Date: Tue, 25 Jul 2006 12:04:17 GMT
Message-ID: <5Nnxg.137717$H71.29072_at_newssvr13.news.prodigy.com>


"Jay Dee" <ais01479_at_aeneas.net> wrote in message news:TMdxg.57794$Eh1.40588_at_tornado.ohiordc.rr.com...

> Brian Selzer wrote:

>> "Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message
>> news:XQ5xg.14298$pu3.333248_at_ursa-nb00s0.nbnet.nb.ca...
>>
>>>Brian Selzer wrote:
>>>
>>>
>>>>"Bernard Peek" <bap_at_shrdlu.com> wrote in message
>>>>news:xn0ep1e3sn11oo000_at_news.individual.net...
>>>>
>>>>
>>>>>In comp.databases.theory Paul Mansour wrote:
>>>>>
>>>>>
>>>>>>The second is perhaps more profound. Consider a rollback database, or
>>>>>>a database that must provide a complete audit trail of every change.
>>>>>>For example, the database must provide the answer to "who changed
>>>>>>this SS number from X to Y, and when did they change it? As far as I
>>>>>>can tell, if there is no way to answer this without an immutable
>>>>>>identifier. ( I suppose you could design the DB to handle specific
>>>>>>cases, but I'm interested in DBMS with native rollback and audit
>>>>>>trail support.)
>>>>>
>>>>>If what you think of as the key is mutable it's not a natural key.
>>>>>Natural keys aren't mutable, at all, ever.
>>>>
>>>>Untrue! Natural keys are often mutable--especially compound keys.
>>>>
>>>>
>>>>
>>>>>The problem here is that there isn't a usable natural key for
>>>>>identifying people. So all that's left are surrogates of varying
>>>>>quality. The SSN is a surrogate that works most of the time. If you
>>>>>assign someone a payroll number and tell tham that thay won't get paid
>>>>>unless they can quote it then you have a close to immutable key. But
>>>>>it's still a surrogate.
>>>>>
>>>>>I think the original post was part correct. Use a natural key when you
>>>>>have one. I'd just add to that, don't use a surrogate unless there is a
>>>>>compelling reason for it. Efficiency is unlikely to be a compelling
>>>>>argument in most cases.
>>>>
>>>>There is always a compelling reason to use surrogates: natural keys can
>>>>change. This makes it difficult--if not impossible--to detect changes
>>>>to rows. For example, Bob is preparing to update a row--that is, he has
>>>>read the row and is in the process of keying in a change. During that
>>>>time, another process updated several rows in the same table. Unless
>>>>the key is immutable, when Bob issues the update, there's no guarantee
>>>>that the row he's updating is the same one that was read out.
>>>
>>>So? Bob issues an update statement to change some attribute identified by
>>>a logical identifier. As long as the data Bob is entering is correct, why
>>>should he care what happens to any other attributes?
>>>
>>
>>
>> He may be overwriting a change made by another user.
>>
>>
>>> This problem is magnified
>>>
>>>>if there are rows related via a foreign key constraint because it's
>>>>possible for the referenced row to appear unchanged. So you're left
>>>>with either maintaining an exclusive lock on the row until Bob returns
>>>>from the golf outing, or adding additional columns and code in order to
>>>>determine with certainty whether or not a change occurred between the
>>>>time that a row was read and the time of the update.
>>>
>>>Or you can just write better applications that don't update anything that
>>>didn't change. By introducing some 'under-the-covers' identifying
>>>attribute, you create a risk that Bob will change some data identified by
>>>a familiar logical identifier and some other process in the meantime will
>>>associate that identifier with a different surrogate. Your application
>>>will then record the updates against the wrong logical identifier.
>>>
>>
>>
>> You're missing the point. The change Bob's making may change the row,
>> but because several changes occurred to the table while Bob was keying in
>> his change, the row he's about to change may represent some other entity
>> altogether.
>
> This is the scenario you're complaining about?  That Bob may come back
> after an hour or two and mash the 'Store' button on his interface and
> overwrite whatever Suzie or Betty or Ralph have updated since Bob
> pressed his 'Fetch' button?
>
> Surrogates ain't gonna fix that.
>

Sure they will, because now the application can determine with certainty whether or not a specific row changed during that interval. There is no possibility that that row refers to a different entity, nor will there be erroneous indications, such as "row not found" caused by a key change. Instead, the application can inform the user exactly what happened and what is different.

>> The Relational Model doesn't take into account duration, so the value of
>> a database at the time of a read is not necessarily the same as that at
>> the time of a write, nor is the meaning of a key with a specific atomic
>> value at the time of a read necessarily the same as the meaning of a key
>> with the same atomic value at the time of a write. Surrogates provide
>> the continuity across database states that the Relational Model lacks.
>>
>>
>>>It seems to me you just went to a lot of trouble to invent a problem that
>>>never existed in the first place.
>>>
>>
>>
>> I didn't invent this problem. I've run across it several times. It's
>> extremely difficult to diagnose and even more difficult to repair, since
>> it lets garbage get into the database.
>>
>>
>>>>Natural keys are necessary to maintain the integrity of the information
>>>>stored in the database; surrogates are necessary to maintain the
>>>>integrity of information during the time that it is in use by
>>>>applications.
>>>
>>>I disagree entirely.
>>
>>
Received on Tue Jul 25 2006 - 14:04:17 CEST

Original text of this message