Re: Surrogate Keys: an Implementation Issue

From: paul c <toledobythesea_at_oohay.ac>
Date: Thu, 27 Jul 2006 05:11:45 GMT
Message-ID: <lWXxg.243067$iF6.132287_at_pd7tw2no>


Brian Selzer wrote:
> "Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message
> news:PPMxg.21411$pu3.353655_at_ursa-nb00s0.nbnet.nb.ca...

>> Brian Selzer wrote:
>>
>>> "Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message 
>>> news:X1gxg.15744$pu3.340040_at_ursa-nb00s0.nbnet.nb.ca...
>>>
>>>> Brian Selzer wrote:
>>>>
>>>>
>>>>> "Bob Badour" <bbadour_at_pei.sympatico.ca> wrote in message 
>>>>> news:XQ5xg.14298$pu3.333248_at_ursa-nb00s0.nbnet.nb.ca...
>>>>>
>>>>>
>>>>>> Brian Selzer wrote:
>>>>>>
>>>>>>
>>>>>>> "Bernard Peek" <bap_at_shrdlu.com> wrote in message 
>>>>>>> news:xn0ep1e3sn11oo000_at_news.individual.net...
>>>>>>>
>>>>>>>
>>>>>>>> In comp.databases.theory Paul Mansour wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> The second is perhaps more profound. Consider a rollback database, 
>>>>>>>>> or
>>>>>>>>> a database that must provide a complete audit trail of every change.
>>>>>>>>> For example, the database must provide the answer to "who changed
>>>>>>>>> this SS number from  X to Y, and when did they change it? As far as 
>>>>>>>>> I
>>>>>>>>> can tell, if there is no way to answer this without an immutable
>>>>>>>>> identifier. ( I suppose you could design the DB to handle specific
>>>>>>>>> cases, but I'm interested in DBMS with native rollback and audit
>>>>>>>>> trail support.)
>>>>>>>> If what you think of as the key is mutable it's not a natural key.
>>>>>>>> Natural keys aren't mutable, at all, ever.
>>>>>>> Untrue!  Natural keys are often mutable--especially compound keys.
>>>>>>>
>>>>>>>
>>>>>>>> The problem here is that there isn't a usable natural key for
>>>>>>>> identifying people. So all that's left are surrogates of varying
>>>>>>>> quality. The SSN is a surrogate that works most of the time. If you
>>>>>>>> assign someone a payroll number and tell tham that thay won't get 
>>>>>>>> paid
>>>>>>>> unless they can quote it then you have a close to immutable key. But
>>>>>>>> it's still a surrogate.
>>>>>>>>
>>>>>>>> I think the original post was part correct. Use a natural key when 
>>>>>>>> you
>>>>>>>> have one. I'd just add to that, don't use a surrogate unless there is 
>>>>>>>> a
>>>>>>>> compelling reason for it. Efficiency is unlikely to be a compelling
>>>>>>>> argument in most cases.
>>>>>>> There is always a compelling reason to use surrogates: natural keys 
>>>>>>> can change.  This makes it difficult--if not impossible--to detect 
>>>>>>> changes to rows.  For example, Bob is preparing to update a row--that 
>>>>>>> is, he has read the row and is in the process of keying in a change. 
>>>>>>> During that time, another process updated several rows in the same 
>>>>>>> table.  Unless the key is immutable, when Bob issues the update, 
>>>>>>> there's no guarantee that the row he's updating is the same one that 
>>>>>>> was read out.
>>>>>> So? Bob issues an update statement to change some attribute identified 
>>>>>> by a logical identifier. As long as the data Bob is entering is 
>>>>>> correct, why should he care what happens to any other attributes?
>>>>> He may be overwriting a change made by another user.
>>>> The only relevant concern is whether the update is correct. If the update 
>>>> is correct, it matters not at all that it overwrites something, which is 
>>>> presumably no longer correct.
>>>>
>>>>
>>>>
>>>>>> This problem is magnified
>>>>>>
>>>>>>
>>>>>>> if there are rows related via a foreign key constraint because it's 
>>>>>>> possible for the referenced row to appear unchanged.  So you're left 
>>>>>>> with either maintaining an exclusive lock on the row until Bob returns
>>>>>> >from the golf outing, or adding additional columns and code in order 
>>>>>>> to
>>>>>>> determine with certainty whether or not a change occurred between the 
>>>>>>> time that a row was read and the time of the update.
>>>>>> Or you can just write better applications that don't update anything 
>>>>>> that didn't change. By introducing some 'under-the-covers' identifying 
>>>>>> attribute, you create a risk that Bob will change some data identified 
>>>>>> by a familiar logical identifier and some other process in the meantime 
>>>>>> will associate that identifier with a different surrogate. Your 
>>>>>> application will then record the updates against the wrong logical 
>>>>>> identifier.
>>>>> You're missing the point.  The change Bob's making may change the row, 
>>>>> but because several changes occurred to the table while Bob was keying 
>>>>> in his change, the row he's about to change may represent some other 
>>>>> entity altogether.
>>>> Using logical identity and natural keys, that's not possible. The key 
>>>> that Bob specifies identifies the entity. And if Bob updates the entire 
>>>> row, the entire row needs updating. I draw your attention, in particular, 
>>>> to my observation that one can just write better applications that don't 
>>>> update anything that didn't change.
>>> Of course it's possible.  The key that Bob specifies identifies the 
>>> entity at the time of the read from the database.  By the time that the 
>>> write occurs, that same key now refers to a different entity.
>> Are you stupid or just ignorant? What you state above can only happen if 
>> one assumes one is using a hidden surrogate. If one uses explicit values 
>> that Bob interacts with, then the value identifies a single entity. That 
>> identity does not change.
>>

>
> From Bob's perspective, what he would expect is that the entity that he's
> working on, identified by the key values he used to retrieve it, would not
> change; however, since the database is no longer the same when Bob attempts
> to commit, the entity that he was working on may either no longer exist, or
> may have been changed by another user. During that same interval, a second
> entity may have been changed so that by the time Bob attempts to commit,
> that second entity may be identified with respect to the new database state
> by the exact same key values that Bob read out.
> ...

I think this is misreading the information principle and perhaps it is due to falling into the trap of thinking typical lock manager behaviour has something to do with RT. Even if it's not that, it's a mistake. It doesn't matter if the rows involved (not necessarily the ones updated) were deleted twenty times and re-inserted. If the invoking transaction were required to re-iterate the 'rows' it read whose values it considered to be crucial for a correct update, then there would less chance of falling into this trap.

p Received on Thu Jul 27 2006 - 07:11:45 CEST

Original text of this message