Re: Normalization, Natural Keys, Surrogate Keys

From: Anton Versteeg <>
Date: Thu, 16 May 2002 09:38:50 +0200
Message-ID: <>

Kai Ponte wrote:

> My state (CA) is involved in a multi-hundred-million dollar project
> with a consulting company to re-write one of our county mainframe
> systems into a client-server system. (I probably should leave the
> names out.) Anyway, we recieved this document which they called
> "detailed design" that was supposed to describe the application they
> are writing for us. Unfortunately, I firmly believe the company is
> blowing smoke and has sent us a bunch of ERD's, UML's and various
> descriptions of data theory. (There are no screen descriptions or
> workflow items.)

It looks like they have started already to implement the design. Did you ever receive a conceptual or logical data model?

Designing a database without looking at how users are going to interact (screens, reports etc) with the system, sounds a bit odd to me.

> The database being proposed for this applicaiton is Oracle. Fine
> enough. (I prefer MS SQL or MySQL, just because I'm used to them.) In

I would prefer DB2 :)
You didn't mention what you are curently using on the mainframe. That could be a consideration too. Mainframes can participate very well in a (web)client/server or environment.

> one of the descriptions in the document, they indicate that they are
> creating a databse design which uses Natural Keys vs. Surrogate Keys
> because they "were necessary for the nomalization of the data model."
> The document went on to say, "natural keys are necessary to determine
> functional dependence and the efficient normalization of the the data
> model."

Not in a conceptual model.

> Now, I may not be a database genius, but this doesn't sound right. It
> may have been years since my last database design class, but I don't
> remember anywhere where surrogate keys or natural keys had any impact
> on the ability of a database designer to normalize a database.
> To make matters worse, they later describe the use of a "code table"
> to store various codes. This is something I've often seen in IMS and
> various btrieve-based databases and I don't believe belongs anywhere
> in a relational model.

Using a single code table as opposed to multiple tables or other constraints like field check constraints has advantages and disadvantages. If I understand you correctly they want something like (code_type, code_value, code_description). Advantage is that you can easily add new code types without changing the data model. This works best if your codes all have the same or almost the same data type. A big disadvantage is that you need to store the code_type in your other data together with the code values. This not only increases the amount of data you need to store but also complicates the databse and application design. What to do if a table has several different code types? Specially if many of the codes just have a few valid values like Y/N this is in my experience not the way to go.

> Can somebody please comment on this? Am I totally wrong?
> KP

If your company cannot review the design, I would ask a third party to do a review.
My 2 cents.

Anton Versteeg
DB2 Specialist
IBM Netherlands
Received on Thu May 16 2002 - 09:38:50 CEST

Original text of this message