Re: Normalization, Natural Keys, Surrogate Keys

From: Costin Cozianu <>
Date: 30 May 2002 09:38:51 -0700
Message-ID: <>

Bernard Peek <> wrote in message news:<>...
> In message <
> >, Costin Cozianu <> writes
> >For example, in the above, because you allow for that 999999999 for
> >situations that are very rare (let's say <1% or <5% of the cases), you
> >fail to protect the vast majority of cases, because you won't be able
> >to declarer the UNIQUE constraint on the SSN. There are better way to
> >deal with this, one example that I suggested is create an
> >SSN_Exceptions table where you put the data collected for the persons
> >who don't have a SSN or their SSN has problems.
> There's an alternative solution, add an extra integer field to the main
> table with a default value of 0. Make the SSN and the new field a
> composite key. When you have an unknown or invalid (or duplicate) SSN
> insert a sequential number into the integer field. Any record with a
> non-zero value should be considered an ongoing problem.
> If you want you can create a view onto that table that will perform the
> same functions as the SSN_Exceptions table.

This is a bad solution.

You allow for SSN duplicates, and you put more burden on the app developer. You can't easily solve the sequential number with a trigger (depending on the database, it rpbably won't allow you to SELECT the table you are modifying in a row level trigger). Even if you solve it with a trigger the application isn't aware of it. Probably you'd create a stored procedure to take care of insertion.

Or the app will first check for already existing SSN, and then proceed, which is quite inefficient.

Compare that with having an UNIQUE constraint on SSN and then the code will look straight forward

  try {
   ... INSERT ...
  catch (SQLException ex) {
    if (isSsnDuplicateException(ex)) {

       treat SSN exception accordingly

But the really bad thing about this design is that it allows duplicate SSN to begin with, and it might allow creation and propagation of other data related to the bogus SSN through the RowID. For example you can't declare in a related Accounts table that the Person_RowID foreign key references strictly

And that data might be treated by the system like regular data ( it's not likely that programmers will join several tables to find out if the sequence number for the person with the specific RowID is 0 or not).

The goal is not to create an alternate key, but to enforce data quality. That's why natural keys are critical, because they are the only one that enforce the data quality.

Of course, it all depends on the actual requirements of the actual practical problem, but for example in a banking system the situation is simple: either your SSN is alright or we don't do business with you. I hope the DMV and other critical state agencies have the same policy: fix your problems with Social Security Administration first, get a valid identification and then we'll do bsuiness with you.

Why bother to input you in the system with a bogus SSN only to fix it later ? I can put your data in an exception table to save you time later, but that's as far as one should go. Received on Thu May 30 2002 - 18:38:51 CEST

Original text of this message