Re: Normalization, Natural Keys, Surrogate Keys

From: David Cressey <david_at_dcressey.com>
Date: Sat, 18 May 2002 13:29:07 GMT
Message-ID: <DGsF8.4816$Bz4.13392_at_petpeeve.ziplink.net>


Tobin,

I'm late to the party on this instance of this debate. In general, normalization is one way to look at a design, and, for certain purposes, the "better" of two designs is the one that's more normalized. However, there are
other ways to measure design "goodness", and sometimes those other ways end up preferring the less normalized design... not because it is less normalized, but because it achieves some other design goal more nearly than the more normalized design does.

> 1. By making the parent id a foreign key, it seems like you're losing
> information in the design. More specifically, a child is related to a
> parent, but not dependent on it for it's existance. So here, the business
> requirements are not well comunicated through the design?
>

It's not really the function of a design to communicate the business requirements. If you need to remember what the requirements were, then document the requirements as such. How well a design satisfies the requirements is more important than how well the design communicates the requirements.

> 2. Everywhere I go people are trading off performance over everything
else.
> Ok, I admit I'm a little touchy on the subject, but 2 of your 3 points
were
> directly related to performance. In my limited experience, I've seen more
> projects fail due to 'understandability' than performance.
>

Basically, I agree. In my own experience, what I'll call "sound" design yields what I'll call "routine" preformance... neither amzingly fast nor surprisingly slow. In exceptional cases, performance requirements require special design. I find that, in most applications the special design can be limited to the subsystem that really needs the performance.

In this case, a few tables might be designed for high performance while trading off simplicity, but most tables would not need to be altered from "sound" design in order to meet performance requirements.

> 3. Hymns in database discussion groups. Now that's definately violating
some
> integrity consraints!?

...and the people all said, "Amen".

>
> I'm keen to learn more, so how can your 'surrogate key' version
communicate
> the same thing as a composite primary key? I now have to examine the
> parent/buesiness rules to realise what it's 'real ' key is, since you've
> replaced it with a surrogate key. BTW, I'm not going to argue about the
> performance issues, although database vendors could probably create their
> own surrogate keys internally to replace the composite key. Which would
> remove the performance probs. (I'm thinking on the fly here, so be gentle)
>

The issue of surrogate keys vs natural keys can be discussed on a more abstract level. Surrogate keys sometimes outlast their natural counterparts. Consider the use of "Customer Id" in a data warehouse. Now consider what happens when two companies merge, and the warehouse has to keep track of customer data for both of them. If the two companies have mutually incompatible "Customer Id" schemes, something is going to have to give. The use of a surrogate key in this case might have prevented a maintenance crisis. There are other cases, but this message is getting long.

David Cressey Received on Sat May 18 2002 - 15:29:07 CEST

Original text of this message