Re: Normalization, Natural Keys, Surrogate Keys

From: Larry Coon <larry_at_assist.org>
Date: Thu, 16 May 2002 09:43:46 -0700
Message-ID: <3CE3E1C2.7DAD_at_assist.org>


Anton Versteeg wrote:

> It looks like they have started already to implement the design.
> Did you ever receive a conceptual or logical data model?
>
> Designing a database without looking at how users are going to interact
> (screens, reports etc) with the system, sounds a bit odd to me.

Another school of thought says that all uses to which the data will ever be put are not known at design time, so the conceptual design should be independent of any application, hardware, O/S, DBMS, language, user, etc.

> > one of the descriptions in the document, they indicate that they are
> > creating a databse design which uses Natural Keys vs. Surrogate Keys
> > because they "were necessary for the nomalization of the data model."
> > The document went on to say, "natural keys are necessary to determine
> > functional dependence and the efficient normalization of the the data
> > model."

(I know I'm answering both of you in the same post here...)

I'm not so sure what "efficient normalization" means. In fact, it sounds like an oxymoron. Since normalization involves decomposing tables and therefore requires more joins to query the data, a normalized database is less efficient, but the loss of efficiency is the price you pay for reducing data anomalies.

Second, I don't buy the statement that natural keys are necessary to determine functional dependence. FD's are a property of the problem domain. If you have a table like this:

                   PARTS

part_no warehouse_no qty warehouse_city ------- ------------ --- --------------
123      10            100  Anaheim
123      22             50  Fullerton
234      22            400  Fullerton

The fact that warehouse_no --> warehouse_city is inherent in the business logic. If I add a surrogate key I don't suddenly lose the ability to determine that this functional dependency exists.

Third, the statement is circular. If natural keys are necessary to determine functional dependence, and a natural key is really an instance of a functional dependence then which do you know first? You can't know the functional dependencies first, by his statement above. But you can't know the natural keys first either, because until you know that there is an FD of attribute(s) -> tuple, you don't know something is a candidate key.

> > To make matters worse, they later describe the use of a "code table"
> > to store various codes. This is something I've often seen in IMS and
> > various btrieve-based databases and I don't believe belongs anywhere
> > in a relational model.

Why not? In the PARTS example I gave above, the 3NF form would be PARTS(part_no, warehouse_no, qty) and WAREHOUSE(warehouse_no, city), the WAREHOUSE table functions as a code table.

> Using a single code table as opposed to multiple tables or other
> constraints like field check constraints has advantages and
> disadvantages. If I understand you correctly they want something like
> (code_type, code_value, code_description).
> Advantage is that you can easily add new code types without changing the
> data model. This works best if your codes all have the same or almost the
> same data type. A big disadvantage is that you need to store the
> code_type in your other data together with the code values.

How so? Code tables are good for domain enforcement when the domain is large and/or changes. So for a car database:

   COLOR
code color
---- --------
12 Beige
23 Silver

And the data tables store 12 or 23. The advantage is that when Marketing changes "Beige" to "Autumn Bronze" and "Silver" to "Moon Mist" the change is localized, and when the domain of colors changes, the change can be reflected in the database via DML in the code table.

> This not only
> increases the amount of data you need to store but also complicates the
> databse and application design. What to do if a table has several
> different code types?

Several different code tables.

> Specially if many of the codes just have a few
> valid values like Y/N this is in my experience not the way to go.

Right.

> If your company cannot review the design, I would ask a third party to do
> a review.
> My 2 cents.

Best advice yet. Sounds like a potential fiasco in the making.

Larry Coon
University of California
larry_at_assist.org
and lmcoon_at_home.com Received on Thu May 16 2002 - 18:43:46 CEST

Original text of this message