Re: A real world example

From: Brian Selzer <brian_at_selzer-software.com>
Date: Tue, 15 Aug 2006 16:14:39 GMT
Message-ID: <PpmEg.7253$%j7.4828_at_newssvr29.news.prodigy.net>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:1155649903.685743.170520_at_i3g2000cwc.googlegroups.com...
> Just as it is good to describe a person by something that will remain
> consistent (for example telling someone that they will recognise 'john'
> by his 'green jumper' is not particularly useful if he does actually
> change his clothes once in a while), so in database design it is vital
> to pick a good candidate key. Desirable properties are:
>
> 1) very likely to maintain uniqueness over relation values.
> 2) very unlikely to change between relation values.
>
> The RM does not require that anything is immutable, but clearly,
> finding a key that tends to immutability is desirable for (2).
>
> It appears to me reading the posts in this thread, that everyone is
> agreed on these points, so I would like to suitably bang all your heads
> together, as in some instances you are all just agreeing loudly, just
> with the vociferous bluster of miscommunication.
>
> Given that all are agreed that surrogates are useful, this leaves the
> _only_ point that seem to be in contention:
>
My point is that the Relational Model has a serious limitation that can compromise integrity.
>
> "Should a surrogate be made hidden from the user"
>
>
> I'd contend no, absolutely not. Surrogates are instigated by real world
> difficulties in distinguishing items even if they are generated by a
> DBMS (in fact it does not matter a jot what generates them logically).
> Hence if they are hidden, it is a clear violation of the information
> principle, which is vital for the real world identification that
> precipitated the need for them in the first place.
>

I agree that it is a violation of the information principle. I also agree that if there's no other way to distinguish things, that is, if there is no relevant natural key, such as for distinguishing individual quarters in a new roll, then they should definitely be exposed to end-users. I don't agree that the lack of real world identification is the only reason to use surrogates; I contend that the fact that that real world identification is not *guaranteed* to remain constant precipitates that need. If there IS a natural key, then as a best practice, the surrogate should not appear on any data entry screen or report. In that case, the surrogate is only needed to work around the limitations of the Relational Model, and since the users have another means of identify something, they don't have a need to know.

Here's a simple example of what can happen:

create table P
(
 x int primary key,
 y int
)
create table F
(
 x int primary key references P(x),

 z int
)

insert P (x, y) values (1, 1)
insert P (x, y) values (2, 3)
insert P (x, y) values (3, 2)
insert F (x, z) values (1, 5)
insert F (x, z) values (2, 4)

select * from P join F on (p.x = F.x)
update P
 set x = case x

    when 1 then 2
    when 2 then 1
   end
 where x in (1, 2)
select * from P join F on (p.x = f.x)

RESULTS: Before:
x y x z
------ -------- ------- --------

1           1           1           5
2           3           2           4

After:
x           y           x           z
------ -------  -------- -------- 
1           3           1           5
2           1           2           4


Should the new values for x in P have cascaded into F? Assume that the system only has the before image and the after image (the first two columns in RESULTS) in order to complete the update. How can such a system differentiate between the above update and the following update?

update P
 set y = case y

    when 1 then 3
    when 3 then 1
   end
 where x in (1, 2)

Clearly first update affects a key, and consequently, the changes should cascade, but with the information available (both updates produce the exact same before and after images), the system cannot differentiate between the two updates; therefore, it cannot determine whether or not to cascade the changes. Updates within the Relational Model are are constrained in the same way as this hypothetical system. All that is available is the preceding instance and the succeeding instance, and if the only key can change, then there is no way to correlate tuples. Received on Tue Aug 15 2006 - 18:14:39 CEST

Original text of this message