Re: what are keys and surrogates?

From: JOG <>
Date: Fri, 21 Dec 2007 09:52:59 -0800 (PST)
Message-ID: <>

On Dec 20, 9:06 pm, (rpost) wrote:
> JOG wrote:
> >Let's see what people make of the following definitions. I'm going to
> >distinguish the use of the term 'key' as used in different layers as:
> >1) A 'Key' is a set of attributes, used at the logical layer to
> >uniquely identify a proposition.
> >2) An 'Identifier' is a set of attributes, used at the conceptual
> >layer to uniquely identify an entity.
> Why use two different terms?

Because they apply to different things, propositions and entities. Don't confuse the two.

> The definition I was taught is different: a key is a constraint on
> a relation, namely a set of attributes such that any two different
> tuples in the relation differ in at least one of these attributes.

Makes sense, but whoever mentioned relations? My definitions were generalized, not specific to E/R or RM.

> This is quite different from your notion,

I don't see the difference. Specifying identifying attributes over a set of propositions will be equivalent to specifying a constraint over a relational encoding.

> which says something about
> the relationship between the keys in your model and something outside
> of your model. (Of course there is a connection; but it's not the same.)

? Who has mentioned anything about relationships between keys? I have certainly said nothing about such things and I worry about why you would put such words in my mouth?

> >* A 'Surrogate Identifier' is an attribute designated at design time
> >to stand in place of natural identifiers that are either: (i)
> >Impractical to observe, (ii) Impractical to manipulate.
> My try: an "artificial" identifier is an relation attribute
> that is a key, and on which equality is the only meaningful operation.
> (I.e. it only serves to cross-identify the tuples of the relation with
> something else, e.g. tuples in another relation, or something outside
> the database).
> A "surrogate" identifier is a *hidden* artificial
> identifier, i.e. even the database queries only use them in equalities.

No attributes should ever be hidden. Ever. If they are, they can be deleterious to the system. Myself and Brian had a /long/ debate about this in the past, and one of the reasons I still consider consider his points with diligence and an open mind, is that he showed a lot of balls towards the end when he acceded the logic of that point - although we heartily debate many others still ;)

I certainly can't face going through that battle all again I'm afraid, safe to say that as a programmer I understand the intuition behind what you think (after all I believe Codd made the same error in RM/T), but know that it is a mistake and can end up in situations where the database gets screwed up.

> (So unless we have specific value domains to safeguard this, this is
> a property of the queries we use rather than the attribute itself.)
> >* Once a 'Surrogate Identifier' has been created it merely becomes an
> >unfamiliar 'Natural Identifier' (it too must become an observable
> >attribute of the entity it applies to).
> This is not true: they can serve as foreign keys without ever being
> published (if your queries are careful enouygh). I'd like to use
> a separate term for arbitrary but published identifiers.

As I previously stated, no attributes should ever be hidden. A tuple is merely an encoded proposition. That proposition orginates prior to entry in the database, and as such all of its contents come from the outside world. In another thread I used a car example to argue against OID's, but one might have used a hidden surrogate instead of a VIN attribute, and ended up with the same broken database.

> >* A 'Surrogate key' is merely a key that contains an attribute that
> >was a surrogate identifier at the conceptual layer.
> For me there is no such thing as a surrogate identifier at the
> conceptual layer. A surrogate is a "1NF" counterpart of an
> arrow in a conceptual model.

Surrogate means in "place of". Your phone company quite sensibly chooses a customerID for you "in place of" of your naturally identifying dna code. Once the surrogate identifier is created it becomes another statable natural attribute (and indeed the phone company asks for it when you call up to complain about your bill). It is unfamiliar compared to your other attributes, an no doubt like me you have to check the top of your bills to find out what it is. And this all has nothing to do with databases, it all happens outside.

Surely that is all straightforward and entirely sensible!

Merry Xmas, J.

> >Any comments?
> --
> Reinier
Received on Fri Dec 21 2007 - 18:52:59 CET

Original text of this message