Re: Normalization, Natural Keys, Surrogate Keys

From: Kai Ponte <>
Date: 24 May 2002 11:23:45 -0700
Message-ID: <>

> (Kai Ponte) wrote in message news:<>...
> >
> > In reflection I tend to agree. By using an artificial surrogate key -
> > aside from making the database easier to read - we avoid the problem
> > of potenially changing data.
> Well, I don't see how you make your database easier to read, if as in
> many OO designed databases I see User_OID instead of UserName, I don;t
> find that easier to read. Not to mention abominations like
> Language_OID, Zip_OID, Currency_OID and stuff like that.

I agree here - Hence I tend to use very descriptive field names. Here's an example from a personal database I'm setting up under MySQL.  I have a comments table with RowID, cmtBootID, cmtDateEntered, cmdText. Pretty easy to read, IMO. And don't even get me started on these COBOL programmers who designed our IMS and DB/2 databases. Some examples: WRK-MAPT-CURR-YEAR-X, CWD-FRLG-QUAL, CWDOICMT-SSA... These all-caps examples are from a COBOL program I'm modifying in our IMS system.

> >What if, for example, the person
> > providing the SSN provided a false SSN? Does that happen? Yes. Often.
> Well, then if it happens there has to be business rules to deal with
> this "happening". Relying on the surrogate key, does not solve the
> problem, it simply says that you're hiding your head in the sand,
> praying for the best.

Yes, and - I'm sorry - I didn't make that clear in my post. Relying soley on the DB to enforce anything is - in my book - not a good practice. I believe wholly that the database is part of an application. It should be designed as part of the application and not in a theoretical vacumn. I believe, therefore, the database designer needs to work with the appliccation designers.

In the case I demonstrated above, which is a real-world example, the end users are required to generate a "fake" SSN for new people who either don't have an SSN assigned yet or cannot remember thiers. The end users are typically instructed to provide 999-99-9999 as the SSN.

In terms of checking for a duplicate when a surrogate key is used then that can be done through the application. If this is practiced then you can choose what to do with the duplicate entry - reject it, allow it, modify the existing entry.

> For example, if I have a surrogate key in your system, and I come to
> ask the operator to change my SSN, something is not quite right.

Um, why? You mean not right with the DBMS or not legally right?

> That's probably a bad design.
Probably is a bad desing in terms of theory, but it works for us. In designing databases with well over 50 tables and millions of records, it makes for very easy maintenance and upgrades, which is my primary goal as a developer and project manager.

> but at the very least if
> ResourceService represents the relationships between Resource and
> Service you SHOULD drop RowID altogether and have the normal primary
> key (ResourceID, ServiceID).
Which would bring me back in to the nightmare of reading goofy primary keys joined out of multiple columns - exactly the sort of thing I am avoiding. Besides if 49 tables use RowID as the primary key and one doesn't that breaks my design apart.

> What does it mean if you have two rows differing only by RowID ? It
> means nothing is a data error.
Huh? :)

> Furthermore, when a program might want
> to delete a record, it might delete by rowID (only one of the logical
> duplicates) while from the logical/business rules point of view the
> system will still contain the reklationship between resource and
> service when it shouldn't.
...and that's where the programmers come in to play. In my detailed documentation I always ensure that these things get taken care of.  

> >I could have used the address as part of the primary key, but...
> You can't. The burden of the physical implementation of a B-tree index
> might be too great. H
B-tree? AAAAAAHHHHHH! (Sorry, six years of dealing with Keyfile, five years of dealing with Btrieve and two years of dealing with IMS have taken their toll.)

> Having a surrogate primary key shouldn't let you off the hook. As a
> rule of thumb, you always HAVE TO HAVE an alternate UNIQUE constraint.
I respectfully disagree with "always have to have." Yes, it may often be necessary, but not always.  

> A good rule of thumb is to always and automatically start your design
> by identifying natural keys and using them, and only when you discover
> serious difficulties at the implementation/physical level, only then
> you introduce surrogate keys, while you keep the UNIQUE constraints.
> You might want to drop the idea of declaring PRIMARY KEY in most cases
> and use only CONSTRAINT xxx UNIQUE ...
Unless I'm wrong, in both MySQL and M$SQL you must identifiy a primary key. Something I don't know?

> You might want to have a look at:
I've downloaded the PDF and will be reading it on the way home this evening.

Thank you for your input to this matter. I hope I don't offend anybody by offering my warped view on the world.

KP Received on Fri May 24 2002 - 20:23:45 CEST

Original text of this message