Re: Normalization, Natural Keys, Surrogate Keys

From: Ed prochak <ed.prochak_at_alltel.com>
Date: 21 May 2002 10:44:58 -0700
Message-ID: <e51b160.0205210944.24596ca_at_posting.google.com>


Paul Tiseo <123tiseo.paul_at_123mayo.edu (drop the numbers)> wrote in message news:<MPG.174f38f14698dd6898970d_at_news.easynews.com>...
> In article <3ce567e8_3_at_news.teranews.com>, pablo_at_dev.null says...
> > Hi Ed,
> >
> > Seems like the issues that you're seeing could have been caused
> > regardless of whether surrogate keys were used or not. Seems like
> > what should have been done are the following items:
> >
> > * UNIQUE index on the natural key
That would help, but only if it is applied everywhere in the system (and the related systems). But if the natural key had been used as the PK, the uniqueness would be assured also.

> > * Implement RI to ensure no orphaned rows and ensure valid
> > relationships
In most cases the RI is there, so these are less a concern. They usually occur due to a scrub trying to clean up a main table (where the RI causes the scrub to fail which is good). The only trouble with these are the additional steps for cleaning them up the sometimes many children also.

RI constraints are easy to maintin, but the real Relational integrity is easy to break. Scrubs and other sources can still easily point some records to the wrong row which is really the main problem. These are hard to repair in a such cases.

> >
> > Perhaps I'm missing something obvious.

No, you are fine Pablo. Good points.

>
> Most people who are suspicious of natural keys are suspicious of
> them for the wrong reasons, as you illustrate above. They don't imply
> any integrity hardship. With the proper constraints and RI, the
> surrogate key stands in for the natural key, hence the name. If you
> can't get the proper controls in on the surrogate, then they can be
> trouble wrt to integrity of data.

Database RI can only tell you that a particular value for a surrogate is not valid. It cannot prevent you from putting Atlanta in Oklahoma if the city and state tables are related only by a surrogate. (no relation to real tables, at least not in our systems).

>
> This said, the only thing I see wrong with surrogate keys is that
> they can cause you extra work. You must, in some cases, think about the
> "properness" of both the surrogate and the natural, alternate key. You

My preference would be the natural key as the PK, and the surrogate as the "alternate". (but then you likely would not need the surrogate)

> now have two columns instead of one, although the impact these days on
> modern DBMSes should be negligeable. Furthermore, it requires additional
> rules on the physical implementation which could hamper performance or
> which might not be doable depending on the engine used. (PK on the
> surrogate and unique index on the complex natural key rather than just a
> single PK on a single complex natural.)

It's not just extra work. When ALL (or nearly all) relationships in a system are identified by surrogate keys (read sequence numbers) you get brain dead programmers using the sequence number from the customer's address as the key to finding his current products purchased. (sorry, didn't mean to rant.)

There is no easy way to clean up some of this data, as I'm sure Daniel knows. Since the surrogates are the ONLY thing common for many tables, there is nothing to compare. Surrogates make a lot of sense in some cases (e.g. Sales Order numbers) where the natural key may be too complex or variable. And I'm not exactly a database designer. But it seems some systems have taken the goal of reducing redundant data to the extreme. And as I mentioned earlier, the system now looks like a network model database, with all the problems model implies.

>
> Plus, I don't understand how someone can go ballistic over
> creating surrogate keys and then turn around and use an SSN or an
> EmployeeID as a PK, for example. These are basically someone else's
> surrogate key, yet are commonly used as PKs.

(Did I go ballistic??)
surrogate keys are okay in the proper place.  I just think they are overdone in many cases.
>
> (Any opinions expressed are strictly mine only and not my employer's)
> --------------------------------------------------------------------
> Paul Tiseo, Intermediate Systems Programmer
> Reply To: 123tiseo.paul_at_123mayo.edu (drop the numbers)

Ed -I am my employer, so you decide whose opinions these are 8^)- Prochak Received on Tue May 21 2002 - 19:44:58 CEST

Original text of this message