Re: Normalization, Natural Keys, Surrogate Keys

From: Joe Novella <jnovella_at_voicenet.com>
Date: Thu, 16 May 2002 01:47:45 GMT
Message-ID: <3CE30FC1.71BCC8B4@voicenet.com>

Kai Ponte wrote:

> My state (CA) is involved in a multi-hundred-million dollar project
> with a consulting company to re-write one of our county mainframe
> systems into a client-server system. (I probably should leave the
> names out.) Anyway, we recieved this document which they called
> "detailed design" that was supposed to describe the application they
> are writing for us. Unfortunately, I firmly believe the company is
> blowing smoke and has sent us a bunch of ERD's, UML's and various
> descriptions of data theory. (There are no screen descriptions or
> workflow items.)

Ideally, a detailed application design should not just include the database design, but also the process model, menu structures, screen layouts and sequence, as well as a number of other items. Is the system being implemented in phases? If so, then a project plan, complete with detailed resource constraints and timelines should be included.

>
> The database being proposed for this applicaiton is Oracle. Fine
> enough. (I prefer MS SQL or MySQL, just because I'm used to them.) In
> one of the descriptions in the document, they indicate that they are
> creating a databse design which uses Natural Keys vs. Surrogate Keys
> because they "were necessary for the nomalization of the data model."
> The document went on to say, "natural keys are necessary to determine
> functional dependence and the efficient normalization of the the data
> model."

Has the consulting firm made an effort to look at the existing data? How do they know that the data can be migrated into their proposed design? You mentioned the mainframe. What is the mainframe DBMS? The existing data should tell them whether or not natural or surrogate keys are necessary. Current data profiling tools can help you normalize existing data by inferring functional dependencies from the data. From there, you can enhance or modify the model to suit the business needs.

You can develop a data model without looking at the existing data, but conversion and ETL processes, not to mention data cleansing efforts are far more difficult to manage successfully.

> Now, I may not be a database genius, but this doesn't sound right. It
> may have been years since my last database design class, but I don't
> remember anywhere where surrogate keys or natural keys had any impact
> on the ability of a database designer to normalize a database.

> To make matters worse, they later describe the use of a "code table"
> to store various codes. This is something I've often seen in IMS and
> various btrieve-based databases and I don't believe belongs anywhere
> in a relational model.

A table of code tables can be used to more efficiently manage critical sets of values. But this is a design issue, as opposed to a data model issue. The data model may contain multiple entities, each with a different set of codes, or may declare suitable domains for coded attributes (depending on your definition of domain).

> Can somebody please comment on this? Am I totally wrong?

Obviously, the specification should be consistent with both design principles as well as business requirements. You haven't mentioned the business requirements, nor the business case for the mainframe replacement, so we cannot comment on those topics. From what you've mentioned, the specification starts with the data model, which is good, since the data structure helps define almost everything else. However, without an assessment of the source data, it's hard to determine whether the model can be successfully implemented.

Hope this helps.
Joe Received on Wed May 15 2002 - 20:47:45 CDT