Re: Normalization, Natural Keys, Surrogate Keys

From: Bernard Peek <bap_at_shrdlu.com>
Date: Thu, 23 May 2002 17:16:50 +0100
Message-ID: <thi0A4LyXR78EwmP_at_shrdlu.com>


In message <uvg9gu3uo.fsf_at_rcn.com>, Galen Boyer <galendboyer_at_yahoo.com> writes
>On Tue, 21 May 2002, bap_at_shrdlu.com wrote:
>> In message <usn4mxfa2.fsf_at_rcn.com>, Galen Boyer
>> <galendboyer_at_yahoo.com> writes
>>
>>>> If you allow this to happen then your data has been corrupted, and
>>>> corrupted in a way that's not immediately obvious.
>>>
>>>What happens if a customer id is email address? There are apps around
>>>where you can't get a natural, non-changing primary key for the
>>>customer. But, a completely logical identifier for, say, an online
>>>app could be an email address, or maybe even a credit card. But, this
>>>identifier can change, or they could have more than one. What do you
>>>do then?
>>
>> You find out who decided to use e-mail address as a unique identifier
>> and tell them to fix it before they leave.
>
>I wonder what yahoo's id is? How bout Amazon's?

I can answer for Amazon. It's a combination of an e-mail address and a cookie and a password.

>
>> I have multiple entries in several on-line databases because I use
>> multiple e-mail addresses. This is fine for me as long as I can
>> remember which address I used where and when.
>
>Well, then the email address can become a natural key, and a single
>surrogate key can reference multiple keys.

If I forget to renew the payments on this domain then someone else could take it over and generate e-mail using this address. E-mail addresses are not natural keys.

>
>> A common fix is to have the customer generate their own identifier as
>> a login ID and just check to make sure that it is unique within your
>> own database.
>
>Well, that is alot harder to remember than an email address, IMO.

But each customer only has to remember one key and they can choose one that is easy for them to remember.

>
>> You can't prevent a customer from using multiple IDs if they choose,
>> so don't bother trying. You might de-duplicate the data later. There
>> are companies that exist to do nothing else but de-duplicate lists of
>> names.
>>
>>>
>>>Or, is the whole surrogate/natural argument centered around creating
>>>keys for data that already has "natural" keys and has nothing to do
>>>with new apps/data?
>>
>> We've been through this one before in comp.databases.theory and I
>> proposed that we use latitude, longitude, altitude and birth-time as a
>> natural key for people. Anything else is a surrogate. We can't usually
>> get a real natural key for people so surrogates are all we have.
>>
>> Other types of data have natural keys and I would strongly advise
>> using them wherever possible. I know that there are people who always
>> use surrogate keys even when they have natural keys available. I hope
>> I don't have to use any of the databases they designed.
>
>Give me a break. This is just cocky for anyone to say. Both can be
>designed quite well.

Pardon the hyperbole. There are costs to using surrogate keys. Each one increases the complexity of the system which necessarily increases the risk of bugs. Any database that uses a surrogate key when there is a suitable natural key has been designed sub-optimally.

-- 
Bernard Peek
bap_at_shrdlu.com

In search of cognoscenti
Received on Thu May 23 2002 - 18:16:50 CEST

Original text of this message