Re: Surrogate Keys: an Implementation Issue

From: Roy Hann <specially_at_processed.almost.meat>
Date: Wed, 19 Jul 2006 17:08:06 +0100
Message-ID: <c8WdnTMOuPpnxSPZRVnyhA_at_pipex.net>


"Paul Mansour" <paul_at_carlislegroup.com> wrote in message news:1153323860.787392.82180_at_75g2000cwc.googlegroups.com...
> I'm working on a little RDBMs project, and I've been pondering the
> general arguments for and against surrogate keys, and have come to
> following conclusion:
>
> The primary key should indeed be a natural key, if one is available, at
> the model level, but at the implementation level the table should have
> a system-supplied, auto-incremented, and hidden, "super primary
> key". This super primary key, (SPK) is then used when populating FKs
> in other tables (again, hidden from the user). It is up to the DBMS to
> do the necessary conversions to insulate the user from knowledge of the
> SPK.
>
> If there is no natural key, then the SPK can simply be exposed to the
> user as the PK for that table.
>
> Thus, from the user's perspective, it appears that, say, Social
> Security Number is sprinkled across a dozen different tables, when
> under the covers its just meaningless 32 bit ints, and the SS number is
> only stored once in the Employee table.
>
> This seems to give all of the advantages of surrogate keys, (very easy
> to change, saves on space) and the advantages of natural keys
> (meaningful, etc), and conversely, none of the drawbacks of either. In
> other words, the best of both worlds.
>
> Is this a bad conclusion?

Yes. It creates a comforting delusion about what you know about the real world, as well as creating lots of extra work, and concealing correct information behind spurious indirection.

The example of SSN numbers constantly comes up in defense of using surrogate keys, because there are problems with them. However, the fact is that a database represents the state of the knowledge the business possesses about its enterprise of interest. The business certainly must know about the limitations of various natural keys because they are down there on the shop floor dealing with the reality every day. If they are presented with a number that purports to be a relevant piece of information (the SSN say) they are going to write it down--precisely because it is relevant. But they also know it is in some sense provisional. All they expect is to be able to correct it in the database so that the database continues to represent their knowledge of the state of the enterprise of interest--*even when they know their knowledge may still be wrong*.

The surrogate saves an inconsequential amount of work--namely propagating the change of value everywhere. But it also costs a lot of coding, testing, development discipline, and obscurity. Good design says we keep one fact in one place. If we propagate something to many places then it is only because it is a foreign key. If it is a foreign key then there should be a foreign key constraint. And if there is an FK constraint it can be declared ON UPDATE CASCADE (or whatever the equivalent incantation is in your DDL of choice). There, three words, job done. Forever.

I do sometimes concede that a surrogate is required, but it is only ever my last resort, never my first, and I would certainly disagree passionately if someone made the sweeping recommendation that my "table should have a system-supplied, auto-incremented, and hidden, 'super primary key'." One might as well use pointers otherwise--they'd do the same job, and probably a heck of a lot faster.

There is a reason why we abandoned that approach 35 years ago, and I am old enough to remember it. :-)

Roy Received on Wed Jul 19 2006 - 18:08:06 CEST

Original text of this message