Re: Surrogate Keys: an Implementation Issue

From: JOG <>
Date: 31 Jul 2006 12:59:47 -0700
Message-ID: <>

Brian Selzer wrote:
> I prefer to think of a surrogate as more of a temporal attribute. While its
> value identifies a proposition in the database, it also indicates that a new
> proposition became known by the database, allowing it to become part of the
> discussion.

A logfile will indicate when a relvar has a different relation value in it. One doesn't need a surrogate for that.

> What do I mean by that? Surrogates make it possible to
> determine whether totally new information is being added to the body of
> knowledge or that existing information has had to be altered to reflect
> changes in the circumstances underpinning that information.

Totally new information? All updates add totally new information. It's a totally new relation each time you make an update. You are confusing RT with the entity/relationship model. The RT is not concerned with entities but statements of fact. I recommend having another look at the ORM as it helped me shift out of that mindset.

> The Relational Model is insufficient in that regard because it does not require that
> primary keys be immutable.
> I think that surrogates should be hidden from end-users, but that is more of
> a best practice than a requirement and should be imposed upon and enforced
> by applications and database administrators. I disagree with the notion
> that surrogates should "leak" into the real world. They should be
> accessible to applications in order to prevent the update problems I've
> described in an earlier post, but their values should never appear on a data
> entry screen or document. There are many reasons for this, here are a
> couple:
> (1) Anything that an end-user can see is subject to a change order. For
> example, a customer may object to being referred to by the number "666."
> (2) It may be necessary to take a database offline and change the physical
> representation or values for surrogates, such as is the case when databases
> are merged. If a user has written down the key value, he'll complain that
> he can't find the information anymore.

If you hide it, it is a row address. A pointer, if one at the logical level.

> There should be a second candidate key on every relation. The difference
> between OIDs and surrogates is that OIDs are pointers that reference an
> instance of an object while a surrogate indicates that a statement of fact
> became known.

OID's reference an instance of a class, not an object, but you are correct that they are pointers. Yet if you hide a surrogate, then there is little difference, it becomes a pointer to a proposition - the field itself is no part of that real world proposition, it merely addresses it. Again I honestly believe it is thinking in terms of entities that generates the confusion.

> This means that in an object store there can be object
> instances that are identical with exception of the object identifier.

And with a hidden surrogate one can record identical real world propositions, with the exception of that row identifier. Its the same mistake imo.

> While it may be interesting to be able to count the number of identical object
> instances, the same proposition can't have become known more than once
> within the same database state. Relations are sets of propositions, so it
> doesn't make any sense to allow duplicates, since something is either known
> to be true or not.

Neither does it make sense to permit duplicate statements of fact, by using a hidden surrogate.

> Therefore, I disagree with the idea of using a surrogate
> without a second candidate key, even if that key includes every other
> attribute in the relation. A surrogate is not a surrogate without a second
> candidate key.
> "JOG" <> wrote in message
> > This discussion has illuminated me, and I would like to share that. A
> > row is of course merely a proposition, and is identified by the
> > predicate fields that make each proposition in that set unique. A
> > surrogate key similar helps us identify a particular row, so there
> > initially seems no problem there, and its not an addressing issue.
> > Rather the issue is that the surrogate _does not exist_ in the real
> > world, it is initially an artifice of the database, and is not part of
> > the original proposition.
> >
> > So whats it doing there? Consider two people with the same names:
> >
> > Ex ( Name(x, "Sarah Jones") & Marital_status(x, married) ) &
> > Ey ( Name(y, "Sarah Jones") & Marital_status(y, married) ) &
> > x != y
> >
> > We don't have sufficient information to distinguish x and y (note this
> > specifically refers to the propositions, not the conceptual entities
> > involved). Perhaps this distinguishing information is impossible to
> > record, or simply cost ineffective to do so. Anyhow, If we had a
> > mechanism that seperated propositions x and y (an OID for example),
> > then how on earth would we know which one to update when the time came?
> > (Obviously we couldn't, and thats part of the problem with OID's.)
> >
> > But are surrogates in RM any different? My initial opinion was that the
> > answer was NO, that they seem to be no different to OID's, except in
> > their encoding. However gradually my opinion has changed. The reason
> > for this is that while an OID is always physical, a surrogate key leaks
> > into the real world almost as soon as it is generated. While it is
> > initially an artifice, it quickly becomes a real attribute, and that
> > validates it's use as part of the logical model. Hence the rows become:
> >
> > Ex ( EmpID(x, 1) & Name(x, "Sarah Jones") & status(x, married) ) &
> > Ey ( EmpID(x, 2) & Name(y, "Sarah Jones") & status(y, married) )
> >
> > Noone can tell you what their OID in some database is, but they may
> > well be able to tell you what their employee/payroll number is.
> >
> > It hence seems to me that it is the leak in to the real world that is
> > absolutely _vital_ to validate surrogate use.
> >
Received on Mon Jul 31 2006 - 21:59:47 CEST

Original text of this message