Re: Guidelines to a decent support of surrogate key implementation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sat, 26 May 2007 15:14:14 GMT
Message-ID: <a9Y5i.2427$5j1.2316_at_newssvr21.news.prodigy.net>


"Cimode" <cimode_at_hotmail.com> wrote in message

news:1180031263.364874.174190_at_q75g2000hsh.googlegroups.com...

> On 24 mai, 20:14, DBMS_Plumber <paul_geoffrey_br..._at_yahoo.com> wrote:
>> On May 24, 9:27 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:
>>
>>
>>
>> > CON
>> > Keys can change over time.  Consider the following:
>>
>> > R{A, B, C} | AB --> C, and several other relations referencing R[AB]. 
>> > R:
>> > {A:1, B:2, C:3}
>> > {A:2, B:2, C:2}
>>
>> > T1: User 1 reads row {A:1, B:2, C:3}, starts a long-running 
>> > calculation.
>>
>> > T2: User 2 updates row {A:1, B:2, C:3} to {A:3, B:2, C:3} updates 
>> > cascade
>> > throughout the database. R:
>> > {A:3, B:2, C:3}
>> > {A:2, B:2, C:2}
>>
>> > T3: User 3 updates row {A:2, B:2, C:2} to {A:1, B:2, C:3} updates 
>> > cascade
>> > throughout the database. R:
>> > {A:3, B:2, C:3}
>> > {A:1, B:2, C:3}
>>
>> > T4: User 1 updates row {A:1, B:2, C:3} to {A:1, B:2, C:4} -- BUG!!!!
>>
>>  All your example does is show that users who don't understand
>> isolation levels will create bugs.
>>
>>  If User 1 was planning to update the tuple they had read, then they
>> damn well better ensure that their read is repeatable. This would
>> block User 2 and User 3.
>>
>>   How is this any different than the case where you choose to
>> concatenate R[A,B] into a single attribute? Let me repeat your example
>>
>> R { A, B } A->B
>>
>> [ A:1, B:1 ]
>> [ A:2, B:2 ]
>>
>>  T1: User 1 Read(A:1,B:1) (Wander off to compute F(B:1) )
>>  T2: User 2 Update(A:1,B:1) -> (A:3,B:1)
>>  T3: User 3 Update(A:2,B:2) -> (A:1,B:2)
>>  T4: User 1 Update(A:1,X) -> (A:1, F(B:1) )  <-- Bug
>>
>>  Let me also say that the biggest PRO of Cimode's Approach # 1 is
>> purely performance. It would be simpler and smaller (fewer indices)
>> and would make all kinds of nice things possible wrt.physical query
>> plans.
>>
>>   Why has it not been done? Because making this change involves
>> serious surgery to the DBMS. Try getting that past the bean
>> counters ....
>

> Let me summarize the latest arguments put...
>

> LINE1> People who think that surrogate key should be an internal
> physical mechanism to the dbms and invisible to designer who could
> focus on the logical selection of primary keys
> LINE2> People who think that a trdbms should allow the designer to
> have some control
> LINE3> People who think there is nothing wrong with current surrogate
> key implementations
>

> I personally belong to first category.
>

> What do you think are the Pro/Cons of each approach? Here is mine:
>

> LINE1>
> PRO
> Designer can focus better on logical design. Less
> error prone in
> key implementation
> Independence between the logical and physical layer is
> implemented
> at design time.
> DBM_Plumber: Performance enhanced because of better
> showplans
>

> CON
> Surrogate key implementation becoming internal
> addressing pointers,
> users can not find any more familiarity in surrogate keys. Assuming
> familiarity is an advantage, that would become a CON.
>

> LINE2>
> PRO
> Familiarity
> Brian: Keys should be subject to change (a violation
> of *stability* requirement for pk's)

Just to clarify: natural keys *can* change. Whether they *should* depends on the situation. Consider a table where the entire heading is the key. Does the stability requirement prohibit updates to such a table? The stability requirement applies less to keys than to the domains from which their values are drawn. It shouldn't be wrong for a user to select a different value in the key domain, but what should be avoided is the user changing the representation of that value within its domain. For example, it shouldn't be wrong to select Part 34253 instead of Part 23134, but what should be avoided is changing the symbol "34253" that represents a specific Part.

I think also that you missed the entire point I was trying to make. Let me put it another way. The intension of a database is universally quantified with respect to time, whereas the extension of a database is existentially quantified. Domains belong to the intension of a database, whereas combinations of domains are materialized in an extension. According to Codd [RM/V2, 1990], "the concept of keys in the relational model were always intended to identify objects in the micro-world that the database is supposed to represent. In other words, keys in the relational model act as surrogates for the objects being modeled." For unary keys, that identification is permanent, or independent of time, since domains belong to the intension of a database and thus are supposed to be invariant. So every time a value from a unary key's domain appears in any extension of the database, that value represents one and only one object in the universe of discourse. For composite keys, on the other hand, that identification is time dependent. Throughout a single extension of a database, a composite key value identifies one and only one object in the universe of discourse, but that extension represents a snapshot of the universe at a particular moment in time, so it may /not/ be the case that the appearance of that same set of values in another extension represents the same object. Consider again a table where the entire heading is the key. Does an update represent the selection of a new object in the universe of discourse, or does the new value represent the current state of affairs for the same object?

So, the LINE1 CON could better be stated in this way: The meaning represented by a composite key is not invariant with respect to time, whereas the meaning represented by a unary key is.

> CONS
> Introducing a part of subjectivity in key generation
> process
> DBMS_Plumber: Increases risk of concurrency problems

>

> LINE3>
> PRO
> ?
> CONS
> Keys implemented are not primary key implementations
> Product oriented bias for definition of keys
> All consequences arising for not having identity
>
Received on Sat May 26 2007 - 17:14:14 CEST

Original text of this message