Re: Normalization, Natural Keys, Surrogate Keys
Date: 29 May 2002 09:20:44 -0700
c_cozianu_at_hotmail.com (Costin Cozianu) wrote in message news:<2cf20de8.0205260901.19582f0a_at_posting.google.com>...
> For example, in the above, because you allow for that 999999999 for
> situations that are very rare (let's say <1% or <5% of the cases), you
> fail to protect the vast majority of cases, because you won't be able
> to declarer the UNIQUE constraint on the SSN. There are better way to
> deal with this, one example that I suggested is create an
> SSN_Exceptions table where you put the data collected for the persons
> who don't have a SSN or their SSN has problems.
> Then, you rely on trhe app programmers to check (for duplicates)
> before insert, most of whom will probably not do becausde they've been
> indoctrinated by objects and object identity, and worse, they might
> use Object/Relational frameworks (even worse, Enterprise Java Beans)
> that work on cruise-controle, they're happy with the artificial
> identity (RowID in this case). So don't be surprise if duplicate SSNs
> will appear.
> The right way to do it is to have an UNIQUE constraint, the
> application will try to insert the data (without checking for
> duplicates), and the database will fail the INSERT in case of invalid
> data, then the application can catch the exception and follow the
> business rules for dealing with these exceptions.
Okay, you've made me think. (I hate it when that happens.) I've discussed this issue with some of my co-workers and here's what I've come up with.
- Yes, having a unique column (or group of columns) in the table is beneficial. This is done even if the column(s) isn't (arent) the PK.
- Allowing the DBMS to catch the duplicates and failing on an insert is a benefit when programmers don't do their due dilligence. (That never happens, right?)
HOWEVER... I thought about the exception table idea and I believe it is a bad design in most instances that I can forsee. (Remember this is just my narrow-minded opinion.) Here's why. Let's say, as in the examples we've been using, we have a table which includes a "supposedly" unique SSN. Let's say too that we want the DBMS to catch any duplicate SSNs and not allow for an INSERT. You suggest that an exception table be created where the duplicates reside. I can see how this would benefit the enforcement of a uniqe column.
The problem that I see is in how to deal with the exceptions. In a real-world situation, the exceptions may never get handled and they will live in the "exeption" table forever. Thus you end up with "duplicate" data in two tables. As a result, problems will crop up with foreign key relationships since they will need to be related to both the primary and exeption tables.
Now, let's say we setup a design with only one table as so:
RowID(PK), SSN, Occurance
We still use the surrogate key to maintain a separation of the business data from the primary key. The SSN column will allow duplicates. The new column added is the occurance column which is set to zero unless a duplicate SSN is entered. Our UNIQUE constraint is the combination of the SSN and occurance. Using my example with data we would see something like the following:
RowID SSN Occurance 1 123-45-6789 0 2 123-43-8989 0 3 123-45-6789 1
The data still is allowed to be entered and it exists in only one table. The occurance flag is incremented each time a duplicate SSN is entered. Programming can be created to deal with the exceptions and provide for a clean up when time permits. (Read Steven Covey's parable on sharpening the saw to determine when this happens.) This design allows for the exeptions you mentioned without forcing the DBMS to maintain a second table.
I think this thread and the discussions within have reinforced the
idea that using a surrogate key is not only a better design for
relational databases but is flexible enough for any situation. I want
to thank all of those who have contributed your thoughts so far.
Kai PonteReceived on Wed May 29 2002 - 18:20:44 CEST