Re: Guidelines to a decent support of surrogate key implementation
Date: Sat, 26 May 2007 15:14:14 GMT
Message-ID: <a9Y5i.2427$5j1.2316_at_newssvr21.news.prodigy.net>
"Cimode" <cimode_at_hotmail.com> wrote in message
news:1180031263.364874.174190_at_q75g2000hsh.googlegroups.com...
> On 24 mai, 20:14, DBMS_Plumber <paul_geoffrey_br..._at_yahoo.com> wrote:
>> On May 24, 9:27 am, "Brian Selzer" <b..._at_selzer-software.com> wrote: >> >> >> >> > CON >> > Keys can change over time. Consider the following: >> >> > R{A, B, C} | AB --> C, and several other relations referencing R[AB]. >> > R: >> > {A:1, B:2, C:3} >> > {A:2, B:2, C:2} >> >> > T1: User 1 reads row {A:1, B:2, C:3}, starts a long-running >> > calculation. >> >> > T2: User 2 updates row {A:1, B:2, C:3} to {A:3, B:2, C:3} updates >> > cascade >> > throughout the database. R: >> > {A:3, B:2, C:3} >> > {A:2, B:2, C:2} >> >> > T3: User 3 updates row {A:2, B:2, C:2} to {A:1, B:2, C:3} updates >> > cascade >> > throughout the database. R: >> > {A:3, B:2, C:3} >> > {A:1, B:2, C:3} >> >> > T4: User 1 updates row {A:1, B:2, C:3} to {A:1, B:2, C:4} -- BUG!!!! >> >> All your example does is show that users who don't understand >> isolation levels will create bugs. >> >> If User 1 was planning to update the tuple they had read, then they >> damn well better ensure that their read is repeatable. This would >> block User 2 and User 3. >> >> How is this any different than the case where you choose to >> concatenate R[A,B] into a single attribute? Let me repeat your example >> >> R { A, B } A->B >> >> [ A:1, B:1 ] >> [ A:2, B:2 ] >> >> T1: User 1 Read(A:1,B:1) (Wander off to compute F(B:1) ) >> T2: User 2 Update(A:1,B:1) -> (A:3,B:1) >> T3: User 3 Update(A:2,B:2) -> (A:1,B:2) >> T4: User 1 Update(A:1,X) -> (A:1, F(B:1) ) <-- Bug >> >> Let me also say that the biggest PRO of Cimode's Approach # 1 is >> purely performance. It would be simpler and smaller (fewer indices) >> and would make all kinds of nice things possible wrt.physical query >> plans. >> >> Why has it not been done? Because making this change involves >> serious surgery to the DBMS. Try getting that past the bean >> counters .... >
> Let me summarize the latest arguments put...
>
> LINE1> People who think that surrogate key should be an internal
> physical mechanism to the dbms and invisible to designer who could
> focus on the logical selection of primary keys
> LINE2> People who think that a trdbms should allow the designer to
> have some control
> LINE3> People who think there is nothing wrong with current surrogate
> key implementations
>
> I personally belong to first category.
>
> What do you think are the Pro/Cons of each approach? Here is mine:
>> error prone in
> LINE1>
> PRO
> Designer can focus better on logical design. Less
> key implementation
> Independence between the logical and physical layer is
> implemented
> at design time.
> DBM_Plumber: Performance enhanced because of better
> showplans
>> addressing pointers,
> CON
> Surrogate key implementation becoming internal
> users can not find any more familiarity in surrogate keys. Assuming
> familiarity is an advantage, that would become a CON.
>> of *stability* requirement for pk's)
> LINE2>
> PRO
> Familiarity
> Brian: Keys should be subject to change (a violation
Just to clarify: natural keys *can* change. Whether they *should* depends on the situation. Consider a table where the entire heading is the key. Does the stability requirement prohibit updates to such a table? The stability requirement applies less to keys than to the domains from which their values are drawn. It shouldn't be wrong for a user to select a different value in the key domain, but what should be avoided is the user changing the representation of that value within its domain. For example, it shouldn't be wrong to select Part 34253 instead of Part 23134, but what should be avoided is changing the symbol "34253" that represents a specific Part.
I think also that you missed the entire point I was trying to make. Let me put it another way. The intension of a database is universally quantified with respect to time, whereas the extension of a database is existentially quantified. Domains belong to the intension of a database, whereas combinations of domains are materialized in an extension. According to Codd [RM/V2, 1990], "the concept of keys in the relational model were always intended to identify objects in the micro-world that the database is supposed to represent. In other words, keys in the relational model act as surrogates for the objects being modeled." For unary keys, that identification is permanent, or independent of time, since domains belong to the intension of a database and thus are supposed to be invariant. So every time a value from a unary key's domain appears in any extension of the database, that value represents one and only one object in the universe of discourse. For composite keys, on the other hand, that identification is time dependent. Throughout a single extension of a database, a composite key value identifies one and only one object in the universe of discourse, but that extension represents a snapshot of the universe at a particular moment in time, so it may /not/ be the case that the appearance of that same set of values in another extension represents the same object. Consider again a table where the entire heading is the key. Does an update represent the selection of a new object in the universe of discourse, or does the new value represent the current state of affairs for the same object?
> CONS
> Introducing a part of subjectivity in key generation
> process
> DBMS_Plumber: Increases risk of concurrency problems
>>
> LINE3>
> PRO
> ?
> CONS
> Keys implemented are not primary key implementations
> Product oriented bias for definition of keys
> All consequences arising for not having identity
Received on Sat May 26 2007 - 17:14:14 CEST