Re: Surrogate Keys: an Implementation Issue

From: Brian Selzer <>
Date: Sun, 30 Jul 2006 16:13:12 GMT
Message-ID: <sU4zg.3169$>

I prefer to think of a surrogate as more of a temporal attribute. While its value identifies a proposition in the database, it also indicates that a new proposition became known by the database, allowing it to become part of the discussion. What do I mean by that? Surrogates make it possible to determine whether totally new information is being added to the body of knowledge or that existing information has had to be altered to reflect changes in the circumstances underpinning that information. The Relational Model is insufficient in that regard because it does not require that primary keys be immutable.

I think that surrogates should be hidden from end-users, but that is more of a best practice than a requirement and should be imposed upon and enforced by applications and database administrators. I disagree with the notion that surrogates should "leak" into the real world. They should be accessible to applications in order to prevent the update problems I've described in an earlier post, but their values should never appear on a data entry screen or document. There are many reasons for this, here are a couple:

(1) Anything that an end-user can see is subject to a change order. For example, a customer may object to being referred to by the number "666." (2) It may be necessary to take a database offline and change the physical representation or values for surrogates, such as is the case when databases are merged. If a user has written down the key value, he'll complain that he can't find the information anymore.

There should be a second candidate key on every relation. The difference between OIDs and surrogates is that OIDs are pointers that reference an instance of an object while a surrogate indicates that a statement of fact became known. This means that in an object store there can be object instances that are identical with exception of the object identifier. While it may be interesting to be able to count the number of identical object instances, the same proposition can't have become known more than once within the same database state. Relations are sets of propositions, so it doesn't make any sense to allow duplicates, since something is either known to be true or not. Therefore, I disagree with the idea of using a surrogate without a second candidate key, even if that key includes every other attribute in the relation. A surrogate is not a surrogate without a second candidate key.

"JOG" <> wrote in message
> This discussion has illuminated me, and I would like to share that. A
> row is of course merely a proposition, and is identified by the
> predicate fields that make each proposition in that set unique. A
> surrogate key similar helps us identify a particular row, so there
> initially seems no problem there, and its not an addressing issue.
> Rather the issue is that the surrogate _does not exist_ in the real
> world, it is initially an artifice of the database, and is not part of
> the original proposition.
> So whats it doing there? Consider two people with the same names:
> Ex ( Name(x, "Sarah Jones") & Marital_status(x, married) ) &
> Ey ( Name(y, "Sarah Jones") & Marital_status(y, married) ) &
> x != y
> We don't have sufficient information to distinguish x and y (note this
> specifically refers to the propositions, not the conceptual entities
> involved). Perhaps this distinguishing information is impossible to
> record, or simply cost ineffective to do so. Anyhow, If we had a
> mechanism that seperated propositions x and y (an OID for example),
> then how on earth would we know which one to update when the time came?
> (Obviously we couldn't, and thats part of the problem with OID's.)
> But are surrogates in RM any different? My initial opinion was that the
> answer was NO, that they seem to be no different to OID's, except in
> their encoding. However gradually my opinion has changed. The reason
> for this is that while an OID is always physical, a surrogate key leaks
> into the real world almost as soon as it is generated. While it is
> initially an artifice, it quickly becomes a real attribute, and that
> validates it's use as part of the logical model. Hence the rows become:
> Ex ( EmpID(x, 1) & Name(x, "Sarah Jones") & status(x, married) ) &
> Ey ( EmpID(x, 2) & Name(y, "Sarah Jones") & status(y, married) )
> Noone can tell you what their OID in some database is, but they may
> well be able to tell you what their employee/payroll number is.
> It hence seems to me that it is the leak in to the real world that is
> absolutely _vital_ to validate surrogate use.
Received on Sun Jul 30 2006 - 18:13:12 CEST

Original text of this message