Re: Normalization, Natural Keys, Surrogate Keys

From: Bob Badour <bbadour_at_golden.net>
Date: Thu, 6 Jun 2002 23:54:44 -0400
Message-ID: <lbWL8.25115$GN2.195830934@radon.golden.net>

KP,

>>My state (CA) is involved in a multi-hundred-million dollar project with a
consulting company to re-write one of our county mainframe systems into a client-server system. ... Unfortunately, I firmly believe the company is blowing smoke... Natural Keys vs. Surrogate Keys... [consulting company claims] "natural keys are necessary to determine functional dependence and the efficient normalization of the the data model." <<

I see you started quite a discussion. I doubt that it shed much light on the subject.

Natural keys are themselves surrogate keys. They are nothing more than familiar surrogates for the things they identify. A name is a surrogate key for a member of one's family. An SSN is a surrogate key that we deem a natural key due its familiarity and because someone else made up the number.

All candidate keys are necessary to normalize. In a normalized database, every non-key attribute of every relation must functionally depend on every candidate key and on no proper subset of any candidate key. It might seem that adding no additional candidate keys to simplify references would reduce the designer's effort to verify the normalization of the design, but this would be a fool's economy. In most cases, it would increase the designer's effort as well as the effort of all who use the design -- including the DBMS. Compound candidate keys often arise as one normalizes the logical design. When compound keys get too complex, it is common and good practice to add another identifying column to a relation to act as an additional candidate key. This provides a simple way to reference the data in the relation and will reduce the number of columns in compound keys for subordinate relations as one further normalizes the design.

For the DBMS to perform its integrity function, one must declare all candidate keys for all relations to the DBMS -- including the "natural" compound candidate keys and the additional simple "surrogate" keys.

If a relation already has a simple candidate key, one not need to make up a new one for simplicity. That does not mean that a relation must never have two simple candidate keys. A trucking company may identify its drivers by driver's license number, by social security number and by employee number. In this case, all three are simple candidate keys and the third was probably added for stability. ie. A driver's license number or SSN might change, but the employee number need never change once assigned by the company.

If the consulting company has proposed a design where half the relations have six or more-column compound candidate keys, I would say they are blowing smoke. If the consulting company has proposed a design where every relation has an arbitrary ID column in addition to the other candidate keys or in place of the other candidate keys, they are likewise blowing smoke.

As an aside:

I notice that you are fond of MySQL and that you believe that it is eating into the Oracle market. Since MySQL is not a DBMS, it does not address the same market that Oracle addresses. When an organization chooses simple file-processing software, it is choosing a completely different type of product from a DBMS. MySQL might eat into Access' marketshare or FoxPro's marketshare or DBase's marketshare or Clipper's marketshare, but its growth does not affect Oracle's share of the DBMS market any more than it affects DB2's market share.

Cheers,
Bob Received on Thu Jun 06 2002 - 22:54:44 CDT