Re: Guidelines to a decent support of surrogate key implementation

From: Cimode <cimode_at_hotmail.com>
Date: 24 May 2007 11:27:43 -0700
Message-ID: <1180031263.364874.174190_at_q75g2000hsh.googlegroups.com>


On 24 mai, 20:14, DBMS_Plumber <paul_geoffrey_br..._at_yahoo.com> wrote:
> On May 24, 9:27 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:
>
>
>
> > CON
> > Keys can change over time. Consider the following:
>
> > R{A, B, C} | AB --> C, and several other relations referencing R[AB]. R:
> > {A:1, B:2, C:3}
> > {A:2, B:2, C:2}
>
> > T1: User 1 reads row {A:1, B:2, C:3}, starts a long-running calculation.
>
> > T2: User 2 updates row {A:1, B:2, C:3} to {A:3, B:2, C:3} updates cascade
> > throughout the database. R:
> > {A:3, B:2, C:3}
> > {A:2, B:2, C:2}
>
> > T3: User 3 updates row {A:2, B:2, C:2} to {A:1, B:2, C:3} updates cascade
> > throughout the database. R:
> > {A:3, B:2, C:3}
> > {A:1, B:2, C:3}
>
> > T4: User 1 updates row {A:1, B:2, C:3} to {A:1, B:2, C:4} -- BUG!!!!
>
> All your example does is show that users who don't understand
> isolation levels will create bugs.
>
> If User 1 was planning to update the tuple they had read, then they
> damn well better ensure that their read is repeatable. This would
> block User 2 and User 3.
>
> How is this any different than the case where you choose to
> concatenate R[A,B] into a single attribute? Let me repeat your example
>
> R { A, B } A->B
>
> [ A:1, B:1 ]
> [ A:2, B:2 ]
>
> T1: User 1 Read(A:1,B:1) (Wander off to compute F(B:1) )
> T2: User 2 Update(A:1,B:1) -> (A:3,B:1)
> T3: User 3 Update(A:2,B:2) -> (A:1,B:2)
> T4: User 1 Update(A:1,X) -> (A:1, F(B:1) ) <-- Bug
>
> Let me also say that the biggest PRO of Cimode's Approach # 1 is
> purely performance. It would be simpler and smaller (fewer indices)
> and would make all kinds of nice things possible wrt.physical query
> plans.
>
> Why has it not been done? Because making this change involves
> serious surgery to the DBMS. Try getting that past the bean
> counters ....

Let me summarize the latest arguments put...

LINE1> People who think that surrogate key should be an internal physical mechanism to the dbms and invisible to designer who could focus on the logical selection of primary keys LINE2> People who think that a trdbms should allow the designer to have some control
LINE3> People who think there is nothing wrong with current surrogate
key implementations

I personally belong to first category.

What do you think are the Pro/Cons of each approach? Here is mine:

LINE1>

        PRO
                Designer can focus better on logical design.  Less
error prone in
key implementation
                Independence between the logical and physical layer is
implemented
at design time.
                DBM_Plumber: Performance enhanced because of better
showplans
        CON
                Surrogate key implementation becoming internal
addressing pointers,
users can not find any more familiarity in surrogate keys. Assuming familiarity is an advantage, that would become a CON.

LINE2>

        PRO
                Familiarity
                Brian: Keys should be subject to change (a violation
of *stability* requirement for pk's)
        CONS
                Introducing a part of subjectivity in key generation
process
                DBMS_Plumber: Increases risk of concurrency problems

LINE3>
        PRO
                ?
        CONS
                Keys implemented are not primary key implementations
                Product oriented bias for definition of keys
                All consequences arising for not having identity
Received on Thu May 24 2007 - 20:27:43 CEST

Original text of this message