Re: Help finding natural keys

From: Alan Gutierrez <ajglist_at_izzy.net>
Date: Sat, 18 Jan 2003 21:11:18 GMT
Message-ID: <WpjW9.10791$I55.404494_at_news2.east.cox.net>


Tim X wrote:

>>>>>> "Alan" == Alan Gutierrez <ajglist_at_izzy.net> writes:

>
> Alan> I've been reading this group and considering the design of my
> Alan> database used for management of patient charts.
>
> Alan> I am generating keys using select max. I've considered what Mr
> Alan> Celko says about how this is just a sequential file model in
> Alan> SQL, but I try as I might, I can't think of good, er, natural
> Alan> keys for my entities.

Mr Cross

First, let me thank you for taking the time to provide me with such a detailed answer. It is thoughtful and covers all the bases. Thank you very much.

> I think there are two issues raised here
>
> 1. Selecting what to use for your primary key - surrogate or natural
> 2. If surrogate, how to generate that key.

ISSUE 1
> There is a lot of debate on which is better surrogate or natural.

> The bottom line, there are no guaranteed techniques for selecting
> surrogate vs natural keys, but inability to identify acceptable
> candidates for a natural primary key is possibly an indication your
> model is either representing an entity which is really a combination
> of one or more distinct entities or your entity is missing important
> attributes.

Through c.d.theory I've learned a lot about surrogate versus natural keys, and I've come to understand why a natural key suitable for use as a primary key is difficult to come by in my application. I can see how the lack of a natural key is a possible indicator of poor design. A surrogate key can bring together attributes that don't belong together. I'm pretty certain that this is not the case in my schema, however, thanks to all the insight received and exercise performed through this discussion.

> Entities representing humans can be difficult because the thing we
> immediately think of as the natural key is a persons name. However,
> names are not unique. Sometimes you can add additional
> entities about the person which will provide sufficient uniqueness for
> your domain. For example, first_name+last_name+date_of_birth may be
> sufficient (if you added birth time it might even be better, but how
> many people know the time of their birth?)

This is true. The patient entity does not really represent a human, it represents a patient's chart. The charts have rules that don't apply to the person they represent. And they really are, in part, unidentified by the organization that created them.

> Even when you find you need to resort to a surrogate primary key, you
> will still need some sort of key to identify the patients. For
> example, if your primary key was just an artificially generated
> surrogate key, how will you then determine which patient called Joan
> Smith you are dealing with - its almost certain nobody will know what
> their patient number is within the system, so you will still need some
> sort of additional information to identify that patient.

That the application is to be shared by dozens of small organizations makes the choice of natural keys that much more difficult. (It is a time share system.) Some organizations will naturally identify a patient by first name, last name, and to break a tie, room number. Home health care organizations won't have rooms however. In emergency situations, the name might not be known at all.

This is part of what makes me feel that the lack natural key is not such a strong indication of a weak design. I cannot find a natural key for a patient, but I know that patients exist, and that they can be uniquely identified, it is just that the identifiers, finger prints, iris patterns, dental impressions, are not suitable for my application.

How will the user find a patient? The user will browse a directory of patients, narrowed to the patients in the user's organization, by first and last name. In the case of Mr John Doe, who is a regular in the emergency room, the means by which the user narrows their search will probably be situational. Of a dozen John Does, only one may be 450 lbs, or have a neck injury, or be in room 12b.

This natural key search is starting to seem like a bit of a boondoggle. It seems that their are many entities out there unique in their own particular way. There are many attributes that many be natural keys like names, but what happens when you internationalize? Ongoing Nursing Assessment may uniquely identify a questionnaire for an organization, but it is not the natural key for that questionnaire in Mexico. I'm becoming partial to surrogate keys not because they are easier, but because they are neutral.

ISSUE 2
> What I found of far more concern was the way you generate your
> surrogate keys. Determineing a new primary key by looking at the
> largest current primary key and adding one to it is very
> dangerous. The big danger is that first selecting the maximum value
> from a field, adding some amount to it and then inserting the
> resulting key back into the table is not an atomic action.

INSERT INTO Patient (firm_id, patient_id) VALUES (:firm_id, 0)
;

The above is atomic in PostgreSQL. The first transaction to execute this statement will block any other transaction attempting to execute this statement until this first transactions rolls back or commits.

UPDATE Patient

   SET patient_id = (SELECT MAX(patient_id)

                       FROM Patient AS P1
                      WHERE P1.firm_id = firm_id)
 WHERE firm_id = :firm_id AND patient_id = 0 ;

Voila.

> Most commercial (and even most free) DBMS have some facility for
> generating id numbers which are guaranteed to be unique
> (e.g. sequence). If your system has such a facility, use it rather
> than the max(some_column)+1 approach. Apart from guaranteeing your id
> numbers will be unique, it will probably be faster than doing a max()
> lookup followed by an insert.

I'm going to put more thought into how I generate my surrogate keys. You are not the only one to raise and eye-brow to SELECT MAX. I chose this over sequences since a sequence was one more database object to maintain. This is an area where I have research to do.

Thank you once again.

Alan Gutierrez - ajglist_at_izzy.net Received on Sat Jan 18 2003 - 22:11:18 CET

Original text of this message