Re: Artificial Primary keys

From: David Cressey <david_at_dcressey.com>
Date: Thu, 24 Jan 2002 14:15:14 GMT
Message-ID: <SFU38.257$Nq6.8458_at_petpeeve.ziplink.net>


The use of SSN to identify employees has to be evaluated at the conceptual level, before being evaluated at the physical level.

With regard to whether a nine byte SSN or a four byte binary integer is "better" from the point of view of storage efficiency, or join efficiency, or whatever, I believe that Jow Celko has said what needs to be said. That's on the
physical level.

On the conceptual level, it's quite a bit murkier than Joe explains it. There are several issues.

The first issue is one of entities.

What does SSN identify? It identifies Social Security registrants. What does employee_id identify? It identifies employees.

If we have a business rule that says, "every employee will be a social security registrant", then we can use SSN to identify employees. But if not, then we can't. Even if we can assert that rule today, we may wish to decide how long that rule will endure, and how we (or our successors) will cope when the rule is allowed to lapse.

I know of many companies who have extended their definition of "employee" to people in countries outside the US. And there was no rule that, in order to be hired by the company, the candidate had to register with US Social Security. There are situations where this is a significant consideration, and others where it is not. This is a modeling issue.

The second issue is one of control over the identifier. The Social Security Administration has done a very good job of not issuing duplicate SSNs. But it's not a perfect job. Indeed the practice of "recycling" SSNs turned out to be a big mistake, and the SSA eventually recognized that fact.

The third issue is one of quality of data. If an employee provides the wrong SSN to the system at hiring time,
there will be repercussions in any event, whether the SSN is used as a key or not. But, if the SSN is used as a key, there will be added repercussions, inside the system. Quality control of an internally assigned key is generally higher than it is for an external key, subject to the quality of the system itself.

All three of the above considerations are relevant to the subject matter, and not to the DBMS technology. But they need to be taken into account when making the decision to use SSN as a key.

--
Regards,
    David Cressey
    www.dcressey.com
Received on Thu Jan 24 2002 - 15:15:14 CET

Original text of this message