Re: Guidelines to a decent support of surrogate key implementation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Fri, 25 May 2007 02:34:48 GMT
Message-ID: <cXr5i.4043$y_7.2845_at_newssvr27.news.prodigy.net>


"DBMS_Plumber" <paul_geoffrey_brown_at_yahoo.com> wrote in message news:1180030480.560699.111200_at_o5g2000hsb.googlegroups.com...
> On May 24, 9:27 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:
>
>> CON
>> Keys can change over time. Consider the following:
>>
>> R{A, B, C} | AB --> C, and several other relations referencing R[AB]. R:
>> {A:1, B:2, C:3}
>> {A:2, B:2, C:2}
>>
>> T1: User 1 reads row {A:1, B:2, C:3}, starts a long-running calculation.
>>
>> T2: User 2 updates row {A:1, B:2, C:3} to {A:3, B:2, C:3} updates cascade
>> throughout the database. R:
>> {A:3, B:2, C:3}
>> {A:2, B:2, C:2}
>>
>> T3: User 3 updates row {A:2, B:2, C:2} to {A:1, B:2, C:3} updates cascade
>> throughout the database. R:
>> {A:3, B:2, C:3}
>> {A:1, B:2, C:3}
>>
>> T4: User 1 updates row {A:1, B:2, C:3} to {A:1, B:2, C:4} -- BUG!!!!
>
> All your example does is show that users who don't understand
> isolation levels will create bugs.
>

No, it is representative of the way data in a database is used and updated under optimistic concurrency control. It is also representative of the way data from a database is used and updated from an ADO disconnected recordset or a ADO.NET dataset.

First the data is read, then the function is computed, then the data is read again and locked, and provided it hasn't changed, the update is committed.

> If User 1 was planning to update the tuple they had read, then they
> damn well better ensure that their read is repeatable. This would
> block User 2 and User 3.
>

So if User1 executes a long-running computation that takes several hours, the locks should be held for the entire time?

> How is this any different than the case where you choose to
> concatenate R[A,B] into a single attribute? Let me repeat your example

Your example was not representative, so I fixed it.

[snipped]

R { A, B } A->B

[ A:1, B:1 ]
[ A:2, B:2 ]

T1: User 1 Read(A:1,B:1) (Wander off to compute F(B:1) )
T2: User 2 Update(A:1,B:1) -> (A:3,B:1)
T3: User 3 Update(A:2,B:2) -> (A:1,B:1)
T4: User 1 Update(A:1,B:1) -> (A:1, F(B:1) )  <-- NOT A BUG!

Why is it not a bug? because (A:1, B:1) means the same thing at T4 that it did at T1 because A:1 represents the same entity at time T4 that it did at time T1. Contrast this with T4 in my original example: (A:1, B:2, C:3) at time T4 refers to a different entity than (A:1, B:2, C:3) at time T1 did.

>
> Let me also say that the biggest PRO of Cimode's Approach # 1 is
> purely performance. It would be simpler and smaller (fewer indices)
> and would make all kinds of nice things possible wrt.physical query
> plans.
>

Simpler, maybe. Smaller, doubtful. The indexes would still be required regardless of whether they're maintained internally by the system, or explicitly specified.

What things?

> Why has it not been done? Because making this change involves
> serious surgery to the DBMS. Try getting that past the bean
> counters ....
>
Received on Fri May 25 2007 - 04:34:48 CEST

Original text of this message