Re: Surrogate primary key plus unique constraint vs. natural primary key: data integrity?

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Thu, 14 Mar 2013 13:01:28 -0400
Message-Id: <20130314130128.e84a2870.jklowden_at_speakeasy.net>


On Thu, 14 Mar 2013 00:38:36 +0100
Jan Hidders <hidders_at_gmail.com> wrote:

> > Application programmers usually invoke this argument, they find it
> > easier to deal with an integer key column.
>
> Sure. But that does not mean that all arguments in favor of surrogate
> identifiers are therefore complete nonsense, ...
> I'd like to see a bit more precise analysis on when and why they are
> a problem, rathern then a naive sweeping generalization that
> declarese them as always evil.

I said earlier in this thread that numbering things doesn't make them uniquer.

Almost every table I've ever seen that used a surrogate key failed to identify the natural key -- and the uniqueness constraint. The sweeping generalization declaring evil holds: if you're creating uniqueness by enumeration, you're doing it wrong.

So the first argument against is that adding a surrogate key, if done harmlessly, is *extra* work. The job of analyzing the data still needs to be done. Else duplicates will be permitted, even though the surrogate "ensures uniqueness".

Having recognized that it's extra work, why do it? That is, is there any good, nonreligious, engineering reason? There can be. There's no theoretical reason insofar as the theory speaks only of keys, not kinds of keys, and is silent on the subject of adding an identifier for convenience. But there can be an engineering reason.

I remember one time.

In 1997 I was working with a table with over a million rows (the horror!) whose natural key was 9 columns in my recollection. It was the biggest table on a server with 512 MB RAM and a 90 MHz processor. For us, it was the mother of all tables.

This table needed a history kept of its changes, although only the quantity column could change over time. Nightly processing also involved some self-joins. For reasons of efficiency, we added an ID column. It made certain operations faster, and some queries easier to write.

I reckon by now I've created thousands of tables. That's the only one I remember inventing an identifier for. We thought it was necessary at the time, and maybe it was.

--jkl Received on Thu Mar 14 2013 - 18:01:28 CET

Original text of this message