Re: Normalization, Natural Keys, Surrogate Keys

From: Mark D Powell <mark.powell_at_eds.com>
Date: 16 May 2002 07:31:18 -0700
Message-ID: <178d2795.0205160631.71b3a2cd_at_posting.google.com>


Joe Novella <jnovella_at_voicenet.com> wrote in message news:<3CE30FC1.71BCC8B4_at_voicenet.com>...
> Kai Ponte wrote:
>
> > My state (CA) is involved in a multi-hundred-million dollar project
> > with a consulting company to re-write one of our county mainframe
> > systems into a client-server system. (I probably should leave the
> > names out.) Anyway, we recieved this document which they called
> > "detailed design" that was supposed to describe the application they
> > are writing for us. Unfortunately, I firmly believe the company is
> > blowing smoke and has sent us a bunch of ERD's, UML's and various
> > descriptions of data theory. (There are no screen descriptions or
> > workflow items.)
>
> Ideally, a detailed application design should not just include the
> database design, but also the process model, menu structures, screen
> layouts and sequence, as well as a number of other items. Is the system
> being implemented in phases? If so, then a project plan, complete with
> detailed resource constraints and timelines should be included.
>
> >
> > The database being proposed for this applicaiton is Oracle. Fine
> > enough. (I prefer MS SQL or MySQL, just because I'm used to them.) In
> > one of the descriptions in the document, they indicate that they are
> > creating a databse design which uses Natural Keys vs. Surrogate Keys
> > because they "were necessary for the nomalization of the data model."
> > The document went on to say, "natural keys are necessary to determine
> > functional dependence and the efficient normalization of the the data
> > model."
>
> Has the consulting firm made an effort to look at the existing data? How
> do they know that the data can be migrated into their proposed design?
> You mentioned the mainframe. What is the mainframe DBMS? The existing
> data should tell them whether or not natural or surrogate keys are
> necessary. Current data profiling tools can help you normalize existing
> data by inferring functional dependencies from the data. From there, you
> can enhance or modify the model to suit the business needs.
>
> You can develop a data model without looking at the existing data, but
> conversion and ETL processes, not to mention data cleansing efforts are
> far more difficult to manage successfully.
>
> > Now, I may not be a database genius, but this doesn't sound right. It
> > may have been years since my last database design class, but I don't
> > remember anywhere where surrogate keys or natural keys had any impact
> > on the ability of a database designer to normalize a database.
>
> > To make matters worse, they later describe the use of a "code table"
> > to store various codes. This is something I've often seen in IMS and
> > various btrieve-based databases and I don't believe belongs anywhere
> > in a relational model.
>
> A table of code tables can be used to more efficiently manage critical
> sets of values. But this is a design issue, as opposed to a data model
> issue. The data model may contain multiple entities, each with a
> different set of codes, or may declare suitable domains for coded
> attributes (depending on your definition of domain).
>
>
> > Can somebody please comment on this? Am I totally wrong?
>
> Obviously, the specification should be consistent with both design
> principles as well as business requirements. You haven't mentioned the
> business requirements, nor the business case for the mainframe
> replacement, so we cannot comment on those topics. From what you've
> mentioned, the specification starts with the data model, which is good,
> since the data structure helps define almost everything else. However,
> without an assessment of the source data, it's hard to determine whether
> the model can be successfully implemented.
>
> Hope this helps.
> Joe

Kai, I think Joe provided a very good reply. As far as natural vs generated keys I prefer the use of natural keys over generated keys for most cases. The primary reason being that the customer will be seeking the rows that match a specific business key and it will be necessary to index these columns to enforce the business rules and for performance reasons. The generated key ends up being redundant since the business keys provide the relational uniqueness, business rule enforcement, and are the where clause conditions for end-user and adhoc queries. So called natural keys are often easier for the end-users/customers to relate to and that may be part of the reason they were chosen.

Oracle is very good at handling large numbers of concurrent users, large physical data stores, and the combination of the two so I would choose it over MS SQLserver if these conditions exist.

But you may be right about the vendor blowing smoke. It is often impossible to tell. The sales team can be total technical idiots and yet the product may well be rock solid or the sales team could answer every question thrown at them instantly but the product may well be a total piece of crap. In most cases you will not know until after you have already parted with your money.

IMHO -- Mark D Powell -- Received on Thu May 16 2002 - 16:31:18 CEST

Original text of this message