Re: Natural keys vs Aritficial Keys

From: <robur.6_at_gmail.com>
Date: Fri, 29 May 2009 01:05:07 -0700 (PDT)
Message-ID: <e08512d2-7e47-4fc9-b337-6a3911596576_at_r37g2000yqd.googlegroups.com>


On 15 Mai, 19:57, "Walter Mitty" <wami..._at_verizon.net> wrote:

> Every time the issue of natural keys comes up in Stackoverflow.com, the
> prevailing view seems to be that the best primary keys are artificial and
> opaque. And responses that take this view get the votes. They emphasize
> efficiency (mainly efficiency in the index that you usually get
> automatically when you declare a PRIMARY KEY). But data integrity seems to
> be forgotten, here. The same is not generally tru when the conversation
> turns to referential integrity. Slowly but surely the programming community
> seems to heve been dragged, kicking and screaming, into turning over
> referential integrity enforcement to the DBMS.
>

A true story. Recently I was assigned to help some colleagues to a project. By looking at the source code I discovered a lot of design flaws, one of them being related to auto number PK.

The application reads some log files and imports them into a table. It was hard to find a natural key so the table got an auto number PK. However they discovered that if the process is restarted log records are imported twice. So they added a column named “hash” and put a unique index on it. The column was initialized in a Perl script with an md5 computed value.

After a while (what a surprise) they discovered that they lose log records due hashing conflicts. So they made the “hash” column wider and initialized it with a value computed by concatenation of several columns in the table. Problem solved.

That table was only a temporary buffer. The data is further processed and transferred to a different database. The final target table has of course an auto number PK. This time a different solution was chosen to avoid duplicates. In the source table they added a “processed” column that is supposed to be true if the data was transferred to the target table. That way the target table will “never” have duplicates since the processed data is marked and cannot be read twice. Again problem solved.

Conclusion? You can live with auto number PK. If you happen to be bitten by duplicates there are a lot of workarounds. Don’t bother to use natural keys, especially composite ones. Theory is for nerds; important is that your application is working... Really? Received on Fri May 29 2009 - 10:05:07 CEST

Original text of this message