Path: news.easynews.com!easynews!news.good.net!news-dc.gip.net!news-fw.gip.net!news.gsl.net!gip.net!feeder.kornet.net!news.maxwell.syr.edu!newsfeed.mathworks.com!news.voicenet.com!news2.voicenet.com.POSTED!53ab2750!not-for-mail
Message-ID: <3CE30FC1.71BCC8B4@voicenet.com>
From: Joe Novella <jnovella@voicenet.com>
X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
Newsgroups: comp.databases.oracle.misc,comp.databases.theory,comp.databases.ms-sqlserver,microsoft.public.sqlserver.programming
Subject: Re: Normalization, Natural Keys, Surrogate Keys
References: <c1ec9b8f.0205151525.6dce24d2@posting.google.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 73
Date: Thu, 16 May 2002 01:47:45 GMT
NNTP-Posting-Host: 207.103.252.226
X-Complaints-To: abuse@voicenet.com
X-Trace: news2.voicenet.com 1021513665 207.103.252.226 (Wed, 15 May 2002 21:47:45 EDT)
NNTP-Posting-Date: Wed, 15 May 2002 21:47:45 EDT
Xref: easynews comp.databases.oracle.misc:81651 comp.databases.theory:20780 comp.databases.ms-sqlserver:71989 microsoft.public.sqlserver.programming:237961
X-Received-Date: Wed, 15 May 2002 18:45:28 MST (news.easynews.com)

Kai Ponte wrote:

> My state (CA) is involved in a multi-hundred-million dollar project
> with a consulting company to re-write one of our county mainframe
> systems into a client-server system.  (I probably should leave the
> names out.)  Anyway, we recieved this document which they called
> "detailed design" that was supposed to describe the application they
> are writing for us.  Unfortunately, I firmly believe the company is
> blowing smoke and has sent us a bunch of ERD's, UML's and various
> descriptions of data theory. (There are no screen descriptions or
> workflow items.)

Ideally, a detailed application design should not just include the
database design, but also the process model, menu structures, screen
layouts and sequence, as well as a number of other items.  Is the system
being implemented in phases?  If so, then a project plan, complete with
detailed resource constraints and timelines should be included.

>
> The database being proposed for this applicaiton is Oracle. Fine
> enough. (I prefer MS SQL or MySQL, just because I'm used to them.) In
> one of the descriptions in the document, they indicate that they are
> creating a databse design which uses Natural Keys vs. Surrogate Keys
> because they "were necessary for the nomalization of the data model."
> The document went on to say, "natural keys are necessary to determine
> functional dependence and the efficient normalization of the the data
> model."

Has the consulting firm made an effort to look at the existing data?  How
do they know that the data can be migrated into their proposed design?
You mentioned the mainframe.  What is the mainframe DBMS?  The existing
data should tell them whether or not natural or surrogate keys are
necessary.  Current data profiling tools can help you normalize existing
data by inferring functional dependencies from the data.  From there, you
can enhance or modify the model to suit the business needs.

You can develop a data model without looking at the existing data, but
conversion and ETL processes, not to mention data cleansing efforts are
far more difficult to manage successfully.

> Now, I may not be a database genius, but this doesn't sound right. It
> may have been years since my last database design class, but I don't
> remember anywhere where surrogate keys or natural keys had any impact
> on the ability of a database designer to normalize a database.

> To make matters worse, they later describe the use of a "code table"
> to store various codes.  This is something I've often seen in IMS and
> various btrieve-based databases and I don't believe belongs anywhere
> in a relational model.

A table of code tables can be used to more efficiently manage critical
sets of values.  But this is a design issue, as opposed to a data model
issue.  The data model may contain multiple entities, each with a
different set of codes, or may declare suitable domains for coded
attributes (depending on your definition of domain).


> Can somebody please comment on this? Am I totally wrong?

Obviously, the specification should be consistent with both design
principles as well as business requirements.  You haven't mentioned the
business requirements, nor the business case for the mainframe
replacement, so we cannot comment on those topics.  From what you've
mentioned, the specification starts with the data model, which is good,
since the data structure helps define almost everything else.  However,
without an assessment of the source data, it's hard to determine whether
the model can be successfully implemented.

Hope this helps.
Joe



