Re: Guidelines to a decent support of surrogate key implementation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Thu, 24 May 2007 16:27:14 GMT
Message-ID: <C1j5i.9436$RX.9306_at_newssvr11.news.prodigy.net>


"Cimode" <cimode_at_hotmail.com> wrote in message news:1179993531.222375.136310_at_o5g2000hsb.googlegroups.com...
> Lately, I have brought emphasis on the subject of concatenated keys vs
> addition of column. I have noticed then that the thread quickly
> turned to a debate about the failure of current dbms's to bring a
> decent support of surrogate key implementation. From what I observe 3
> line of thoughts trends appear:
>
> LINE1> People who think that surrogate key should be an internal
> physical mechanism to the dbms and invisible to designer who could
> focus on the logical selection of primary keys

CON
Keys can change over time. Consider the following:

R{A, B, C} | AB --> C, and several other relations referencing R[AB]. R: {A:1, B:2, C:3}
{A:2, B:2, C:2}

T1: User 1 reads row {A:1, B:2, C:3}, starts a long-running calculation.

T2: User 2 updates row {A:1, B:2, C:3} to {A:3, B:2, C:3} updates cascade throughout the database. R:
{A:3, B:2, C:3}
{A:2, B:2, C:2}

T3: User 3 updates row {A:2, B:2, C:2} to {A:1, B:2, C:3} updates cascade throughout the database. R:
{A:3, B:2, C:3}
{A:1, B:2, C:3}

T4: User 1 updates row {A:1, B:2, C:3} to {A:1, B:2, C:4} -- BUG!!!!

The rows that referenced {A:1, B:2, C:3} at time T1 reference {A:3, B:2, C:3} at time T4, and the rows that referenced {A:2, B:2, C:2} at time T1 reference {A:1, B:2, C:3} at time T4. User 1 cannot detect that a change occurred since the row {A:1, B:2, C:3} is identical to the one he read out at T1!

Note that this anomaly cannot occur when a key is unary because domains are (or at least should be) invariant. As a result, the entity represented by a unary key with value v1 is for all time necessarily different from an entity represented by a unary key with value v2. But while domains are invariant, combinations of domains aren't. Such combinations are represented by the extension of a relation, which can be different at different points in time. So, since composite keys can also be updated, the entity represented by a key composed of values {K1:v1, K2:v2} at time T1 can be the entity represented by a key composed of values {K1:v3, K2:v2} at time T2.

One could argue that the possible combinations of values from a collection of domains is invariant, but I don't agree with that line of thought. If a combination of values from a collection of domains defines an entity for all time, then that should be stated explicitly: a composite or relation-valued domain should be explicitly defined in the schema. That way there is a clear separation between what is assumed to be true for all time and what is assumed to be true at any arbitrary point in time.

One could also argue that the the specification of a composite candidate key has the same effect as specifying a key with a composite domain, but I don't agree with that either, due to the definition of a candidate key. A candidate key specification is a state constraint, meaning that it applys to each possible extension of a relation schema individually and without regard to any other possible extension.

A system-generated surrogate is always unary, and as a result, its values can represent an entity for all time.

> LINE2> People who think that a trdbms should allow the designer to
> have some control
> LINE3> People who think there is nothing wrong with current surrogate
> key implementations
>
> I personally belong to first category.
>
> What do you think are the Pro/Cons of each approach? Here is mine:
>
> LINE1>
> PRO
> Designer can focus better on logical design. Less error prone in
> key implementation
> Independence between the logical and physical layer is implemented
> at design time.
>
> CON
> Surrogate key implementation becoming internal addressing pointers,
> users can not find any more familiarity in surrogate keys. Assuming
> familiarity is an advantage, that would become a CON.
>
> LINE2>
> PRO
> Familiarity
> CONS
> Introducing a part of subjectivity in key generation process
>
> LINE3>
> PRO
> ?
> CONS
> Keys implemented are not primary key implementations
> Product oriented bias for definition of keys
> All consequences arising for not having identity
>
>
> Thank you for bringing your insights onto this theme. Thank you for
> respecting the frame/structure for conciseness's sake.
>
> Regards all...
>
Received on Thu May 24 2007 - 18:27:14 CEST

Original text of this message