Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Normalization, Natural Keys, Surrogate Keys

Re: Normalization, Natural Keys, Surrogate Keys

From: Bernard Peek <bap_at_shrdlu.com>
Date: Thu, 30 May 2002 23:30:31 +0100
Message-ID: <RGpL5W7Hgq98EwDv@shrdlu.com>


In message <2cf20de8.0205300838.745a1c4a_at_posting.google.com>, Costin Cozianu <c_cozianu_at_hotmail.com> writes

>> There's an alternative solution, add an extra integer field to the main
>> table with a default value of 0. Make the SSN and the new field a
>> composite key. When you have an unknown or invalid (or duplicate) SSN
>> insert a sequential number into the integer field. Any record with a
>> non-zero value should be considered an ongoing problem.
>>
>> If you want you can create a view onto that table that will perform the
>> same functions as the SSN_Exceptions table.
>
>This is a bad solution.
>
>You allow for SSN duplicates, and you put more burden on the app
>developer.

If there are SSN duplicates then the system has to handle that condition. There will be an additional burden on the app developer unless they simply reject the duplicate. That's not a good solution because there is no way that the app can know which of the duplicates is correct (if either.)

> You can't easily solve the sequential number with a trigger
>(depending on the database, it rpbably won't allow you to SELECT the
>table you are modifying in a row level trigger). Even if you solve it
>with a trigger the application isn't aware of it. Probably you'd
>create a stored procedure to take care of insertion.
>
>Or the app will first check for already existing SSN, and then
>proceed, which is quite inefficient.
>
>Compare that with having an UNIQUE constraint on SSN and then the code
>will look straight forward
>
> try {
> ... INSERT ...
> }
> catch (SQLException ex) {
> if (isSsnDuplicateException(ex)) {
> treat SSN exception accordingly
> }
> }
>
>
>But the really bad thing about this design is that it allows duplicate
>SSN to begin with, and it might allow creation and propagation of
>other data related to the bogus SSN through the RowID. For example you
>can't declare in a related Accounts table that the Person_RowID
>foreign key references strictly

It defers the handling of the duplicate. A better solution is to update the Error field for both of the records to indicate that neither of them can be trusted.

>
>And that data might be treated by the system like regular data ( it's
>not likely that programmers will join several tables to find out if
>the sequence number for the person with the specific RowID is 0 or
>not).

I was assuming that the SSN and the sequence number would always be used as a compound foreign key. If it's possible to have duplicate SSNs then it's not possible to use SSN alone as a key.

>
>The goal is not to create an alternate key, but to enforce data
>quality.

The goal is to enforce data quality and creating an alternate key is part of the way of achieving that. It flags the data as suspect. Rejecting it would be an error unless you also reject the entry that it clashed with.

> That's why natural keys are critical, because they are the
>only one that enforce the data quality.

That's true, but if it's possible to have a duplicate value for SSN then it isn't a natural key.

>
>Of course, it all depends on the actual requirements of the actual
>practical problem, but for example in a banking system the situation
>is simple: either your SSN is alright or we don't do business with
>you.

I don't think it's acceptable for business rules to be imposed because a database designer can't handle exceptions in the data. There's nothing stopping me from opening an account in a US bank, they might have a little trouble entering my SSN because I don't have one.

> I hope the DMV and other critical state agencies have the same
>policy: fix your problems with Social Security Administration first,
>get a valid identification and then we'll do bsuiness with you.

The situation may have changed but I seem to recall that in the past it was illegal to use the SSN as an identifier except when dealing with the Social Security Administration. For that reason many people deliberately gave a bogus number whenever they were asked for their SSN.

>
>Why bother to input you in the system with a bogus SSN only to fix it
>later ? I can put your data in an exception table to save you time
>later, but that's as far as one should go.

The fact that a SSN is a duplicate doesn't indicate that it's bogus. It implies that there's been an error. That could be that the SSA has issued the same number twice, or that the new entry is in error, or that the old entry is an error or that the database designer wrongly assumed that duplicates were not permitted.

-- 
Bernard Peek
bap_at_shrdlu.com

In search of cognoscenti
Received on Thu May 30 2002 - 17:30:31 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US