Re: Primary vs. Surrogate! What a nightmare debate.

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Sun, 17 Oct 2004 09:46:23 -0400
Message-ID: <f3ttkc.6nd.ln_at_mercury.downsfam.net>


Kostas wrote:

> I must be reading all day on the issue and I am more confused then when I
> started.
> I started my db's with natural keys as I initially was exposed to the
> theory of relational db's and wasn't just assigned the task of building
> something quick and dirty.
> Then I reverted to surrogates for a while, and now, that I made my
> comeback designing some more serious systems (but still relatively small)
> I decided to settle this once and for good.
>
> My idea was that I should be consistent, that is, use either surrogate or
> natural but not both. Well, it turns out that sometimes there just isn't a
> natural key good enough for a relation, and that hundreds of so called
> experts have extremely diverging opinions. To top that, both sides'
> arguments look valid.
>
> My conclusion is that for small applications try to go with the natural
> key when it is readily available, but for large-scale warehouse
> applications surrogates should be seriously considered. At some points,
> this looks like a decision that has to be taken on an individual per/table
> basis.
>
> I was just wondering what the tendency among you people is...since this ng
> has "theory" in it I am daring to guess where your vote goes to, but I had
> to inquire nevertheless.
>

As President Clinton might say, "I feel your pain." Especially where "experts" are so strongly opposed.

My philosophy: The best theory is a working a system.

That being said, I have found that the following rules of thumb allow me to produce working systems:

  1. All tables get a meaningless integer unique column that is used "under the hood" and never seen by users. This is an identity column or in postgreSQL the "oid" for "Object ID". This is my column, it is there to make my life easier, nobody else's. Regardless of any theory, I use it solely to code UPDATE and DELETE statements in code libraries, so that at least some library routines do not have to be table-specific.

Absolutely never use the meaningless integer as a foreign key.

2. Some tables are reference tables, such as the list of customers, vendors, employees and so forth. I give these a character unique key that is entered by the user but otherwise meaningless. So you can have keys in the vendors table like "PHONE_CO" and "LIGHTS" and "SEARS" and "OFFICEMAX" and so forth. This the primary key that is placed into other tables as the foreign key.

This allows the user to accidentally enter the phone company twice, but you actually cannot prevent that no matter what you do, so trying to prevent that is a distraction.

The fact that the users entered the values themselves I have found is more valuable than using system-generated integers. Users more readily recognize and accept them.

3. Some tables are "documents", a term I use myself that is not in general use elsewhere. These are the sales order header/detail table pair, same for purchase orders, and so forth. The Sales Order gets a system-generated numeric sequence, order 1000, order 1001 and so forth. Assignment of this key is automatic.

4. Some tables are transactions, such as manual adjustments to inventory, manual GL entries, or a collection of posting information from AP to GL. These, like documents, get a system-generated integer for each row. You can also group them together by giving each logical unit a batch number. Transactions are always leaf tables, their keys are never used as foreign keys in other tables.

5. Never pack a key, as in never have the key parsable so that it contains hidden secrets, such as Employee_ID "JSmith-2004" telling me that JSmith was hired in 2004. My hire date should be stored as a separate column.

6. As for natural keys, by the time you have a working systems it turns out they have nothing to offer. The only theory I can suggest for why they seem to have little practical use is that correct uniqueness cannot be determined by a computer, so it requires judgement, so you have to let the human do it. This is how the system above came about.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Sun Oct 17 2004 - 15:46:23 CEST

Original text of this message