Re: Surrogate Keys: an Implementation Issue

From: David Cressey <dcressey_at_verizon.net>
Date: Sat, 29 Jul 2006 13:39:02 GMT
Message-ID: <WxJyg.1226$sy2.598_at_trndny01>


"Anith Sen" <anith_at_bizdatasolutions.com> wrote in message news:eaavt3$5c2$1_at_nntp.aioe.org...
> >> I doubt that there are any such rdbms's today. It is like criticism
of
> >> RT because of SQL shortcomings - the lack of faithful implementations
> >> only serves to encourage naive extensions to RT.
>
> The idea of hidden surrogates and associated myths is mistakenly nurtured
by
> many who are intimately familiar with current crop of SQL products. For
> instance, a cursory inspection of some sql programming newsgroups reveals
> several blatant claims like:
> -- Surrogates are immutable
> -- Surrogates should not be exposed the user
> -- Surrogates help performance
> -- Surrogates belong to the physical model
> -- Surrogates are never verifiable in reality
> -- ...
>
> In most cases, claimant misses the fundamental that a surrogate key is
> nothing but a simple, stable and generally an irreducible candidate key.
And
> that it has nothing to do with the physical model and that once used
outside
> the DBMS, it is verifiable as well.
>
> --
> Anith
>
>

I think Anith's comments above are correct.

I've been skipping over this thread for the most part, because the signal to noise ratio of the thread didn't look good to me. So my comment may duplicate what somebody else already wrote.

The elephant in the living room here is whether a surrogate key identifies a row one the one hand, or identifies a subject matter entity described by the (key) contents of the row on the other. Comments like "updated the wrong row" lead inescapably to the idea that there is a "right row", and that the programmer coding an update has the responsibility to identify the right row, and manipulate that row.

If surrogate keys are used as row identifiers, then they are not surrogates for natural keys. They are indirect addressing of the row, as a data structure. The indirection allows for a certain flexibility in reorganizing data physically, without losing linkages. But other than that, it largely misses the point of using either RDM or SQL. If you're going to link rows with other rows, you're back to the graph data model, whether you know it or not.

If on the other hand, surrogates are surrogates for natural keys, about all they provide is immutability in the face of natural key mismanagement, or real world collisions. That's very useful in some circumstances. But it doesn't allow Dave to update "the correct row". It allows Dave to update "the correct data".

An example of real world collisions is having a surrogate key for Employee_ID that will survive the merger of Delta Airlines with Epsilon Airlines. Data warehouses often use surrogate keys for this kind of purpose. Then again, data warehouses often avoid updates completely, except for error correction. So the issues raised by updates are moot in that context.

But this has nothing to do with the substance of most of this discussion (again, unless I missed a key part of the discussion). Received on Sat Jul 29 2006 - 15:39:02 CEST

Original text of this message