Re: Normalization, Natural Keys, Surrogate Keys

From: Bernard Peek <>
Date: Tue, 21 May 2002 14:37:18 +0100
Message-ID: <>

In message <>, Galen Boyer <> writes

>> If you allow this to happen then your data has been corrupted, and
>> corrupted in a way that's not immediately obvious.
>What happens if a customer id is email address? There are apps around
>where you can't get a natural, non-changing primary key for the
>customer. But, a completely logical identifier for, say, an online app
>could be an email address, or maybe even a credit card. But, this
>identifier can change, or they could have more than one. What do you do

You find out who decided to use e-mail address as a unique identifier and tell them to fix it before they leave. I have multiple entries in several on-line databases because I use multiple e-mail addresses. This is fine for me as long as I can remember which address I used where and when.

A common fix is to have the customer generate their own identifier as a login ID and just check to make sure that it is unique within your own database. You can't prevent a customer from using multiple IDs if they choose, so don't bother trying. You might de-duplicate the data later. There are companies that exist to do nothing else but de-duplicate lists of names.

>Or, is the whole surrogate/natural argument centered around creating
>keys for data that already has "natural" keys and has nothing to do with
>new apps/data?

We've been through this one before in comp.databases.theory and I proposed that we use latitude, longitude, altitude and birth-time as a natural key for people. Anything else is a surrogate. We can't usually get a real natural key for people so surrogates are all we have.

Other types of data have natural keys and I would strongly advise using them wherever possible. I know that there are people who always use surrogate keys even when they have natural keys available. I hope I don't have to use any of the databases they designed.

For me there are three types of key and I use them in this order of preference.

        true natural keys guaranteed by the laws of physics.

        surrogate keys that I generate and therefore control.

        surrogate keys issued by *trusted* third-parties.

Anything else shouldn't be completely trusted and perhaps shouldn't have a unique index in a database. If you choose to make the SSN (or any other third-party surrogate) a unique field in the database then you must have a procedure in place to handle duplicate values.

Bernard Peek

In search of cognoscenti
Received on Tue May 21 2002 - 15:37:18 CEST

Original text of this message