Re: Normalization, Natural Keys, Surrogate Keys

From: Kai Ponte <cybermusicdude_at_aol.com>
Date: 21 May 2002 10:30:13 -0700
Message-ID: <c1ec9b8f.0205210930.211d68c9_at_posting.google.com>


Galen Boyer <galendboyer_at_yahoo.com> wrote in message news:<usn4mxfa2.fsf_at_rcn.com>...
> On Mon, 20 May 2002, bap_at_shrdlu.com wrote:
>
> > The surrogate key has its uses (others disagree on that) but you
> > do have to recognise that there are risks. Any move away from the
> > logical data structure has a risk. It's perfectly possible to
> > take two copies of the same datum and attach two different
> > surrogate key values.
>
> Well, its perfectly possible to attach the wrong ssn to a person's row.
> How can you be assured that the correct ssn gets tagged to the correct
> person? The arguments against surrogate keys haven't helped me change
> my mind, although I wasn't the one asking. It should be quite obvious

Although I believe we're digressing here...

In reflection I tend to agree. By using an artificial surrogate key - aside from making the database easier to read - we avoid the problem of potenially changing data. What if, for example, the person providing the SSN provided a false SSN? Does that happen? Yes. Often.  What if the person doesn't have an SSN? Then the person later gets one. If that were the case the data with the invalid or temporary SSN could cause maintenance headaches with joined tables.

The problem with using the natural keys - in these examples - is that the business data is exposed to the logical design. By exposing the business data to the design we will create a maintenance nightmare if (when) that business data changes. By using a surrogate Key - say RowID - we avoid this maintenace issue. Take, for example, the following data structure (not developed by me) from my county's resource directory database. (http://help.sbcounty.gov)

Resource
RowID, AgencyName, AddressStreet, AddressCityID....

Service
RowID, Name

ResourceService
RowID, ResourceID, ServiceID

The primary keys in each table are the RowID. Because no business data is exposed as a primary key, I'm free to add and delete items as I feel fit. The table which joins a resource to a service does so using the identifier from the service table and the resource table.

First off, I have many resources with the same name - i.e. Salvation Army or Red Cross. Since each has a unique ID, I don't have to worry about what name the resource is given. If there is a duplicate the administrator will recognize it immediately and remove it. I could have used the address as part of the primary key, but...

If - at any time - I wish to make a structure change to either the Resource or the Service table then I simply do so without changing the RowID. As long as the rowID remains intact, I have no problem changing the tables. Does this happen? Yes. I recently received a request to update the Resource table by adding an entirely new address field. The user wanted a mailing address added to the street address. So I changed the address fields and added the mailing address. It was not difficult.

Having written all that - and hopefully I made sense - I have discussed with some of my co-workers the subject and we can see a clear example of where a natural key would be beneficial. In terms of data warehousing applications, I can imagine that a natural key will provide for quicker searches than a surrogate key would.

Thanks for all the input so far. This has been a great thread! Received on Tue May 21 2002 - 19:30:13 CEST

Original text of this message