Re: Normalization, Natural Keys, Surrogate Keys

From: Galen Boyer <galendboyer_at_yahoo.com>
Date: 21 May 2002 21:47:08 -0500
Message-ID: <uvg9gu3uo.fsf_at_rcn.com>


On Tue, 21 May 2002, bap_at_shrdlu.com wrote:
> In message <
usn4mxfa2.fsf_at_rcn.com>, Galen Boyer
> <galendboyer_at_yahoo.com> writes
>

>>> If you allow this to happen then your data has been corrupted, and
>>> corrupted in a way that's not immediately obvious.
>>
>>What happens if a customer id is email address?  There are apps around
>>where you can't get a natural, non-changing primary key for the
>>customer.  But, a completely logical identifier for, say, an online
>>app could be an email address, or maybe even a credit card.  But, this
>>identifier can change, or they could have more than one.  What do you
>>do then?

>
> You find out who decided to use e-mail address as a unique identifier
> and tell them to fix it before they leave.

I wonder what yahoo's id is? How bout Amazon's?

> I have multiple entries in several on-line databases because I use
> multiple e-mail addresses. This is fine for me as long as I can
> remember which address I used where and when.

Well, then the email address can become a natural key, and a single surrogate key can reference multiple keys.

> A common fix is to have the customer generate their own identifier as
> a login ID and just check to make sure that it is unique within your
> own database.

Well, that is alot harder to remember than an email address, IMO.

> You can't prevent a customer from using multiple IDs if they choose,
> so don't bother trying. You might de-duplicate the data later. There
> are companies that exist to do nothing else but de-duplicate lists of
> names.
>

>>
>>Or, is the whole surrogate/natural argument centered around creating
>>keys for data that already has "natural" keys and has nothing to do
>>with new apps/data?

>
> We've been through this one before in comp.databases.theory and I
> proposed that we use latitude, longitude, altitude and birth-time as a
> natural key for people. Anything else is a surrogate. We can't usually
> get a real natural key for people so surrogates are all we have.
>
> Other types of data have natural keys and I would strongly advise
> using them wherever possible. I know that there are people who always
> use surrogate keys even when they have natural keys available. I hope
> I don't have to use any of the databases they designed.

Give me a break. This is just cocky for anyone to say. Both can be designed quite well.

> For me there are three types of key and I use them in this order of
> preference.
>
> true natural keys guaranteed by the laws of physics.
>
> surrogate keys that I generate and therefore control.
>
> surrogate keys issued by *trusted* third-parties.
>
> Anything else shouldn't be completely trusted and perhaps shouldn't
> have a unique index in a database. If you choose to make the SSN (or
> any other third-party surrogate) a unique field in the database then
> you must have a procedure in place to handle duplicate values.

-- 
Galen deForest Boyer
Sweet dreams and flying machines in pieces on the ground.
Received on Wed May 22 2002 - 04:47:08 CEST

Original text of this message