Re: Help finding natural keys

From: Tim X <timx_at_spamto.devnul.com>
Date: 19 Jan 2003 10:38:04 +1100
Message-ID: <87fzrqm3kz.fsf_at_tiger.rapttech.com.au>


>>>>> "Alan" == Alan Gutierrez <ajglist_at_izzy.net> writes:

 Alan> ISSUE 2

>> What I found of far more concern was the way you generate your
>> surrogate keys. Determineing a new primary key by looking at the
>> largest current primary key and adding one to it is very
>> dangerous. The big danger is that first selecting the maximum
>> value from a field, adding some amount to it and then inserting
>> the resulting key back into the table is not an atomic action.

 Alan> INSERT INTO Patient (firm_id, patient_id) VALUES (:firm_id, 0)  Alan> ;

 Alan> The above is atomic in PostgreSQL. The first transaction to
 Alan> execute this statement will block any other transaction
 Alan> attempting to execute this statement until this first
 Alan> transactions rolls back or commits.

 Alan> UPDATE Patient
 Alan>    SET patient_id = (SELECT MAX(patient_id)
 Alan>                        FROM Patient AS P1
 Alan>                       WHERE P1.firm_id = firm_id)
 Alan> WHERE firm_id = :firm_id AND patient_id = 0  Alan> ;

 Alan> Voila.

The actual insert/update operations might be atomic in themselves, but not between each other. For example, in your above example, what would happen if another process has inserted a new record after you inserted a new record, but before you issue the update command to set the patient id? My interpretation of what you have above would indicate the possibility both records would get the same patient id - the possibility might be slight because of the firm_id part in the where clause and it is 'unlikely' two records from the same firm will be inserted and waiting for update at the same time. However 'unlikely' is not never.

I have been brought in to work on at least 10 small systems which have suffered from this very mistake. In a few of the cases the problem was the database did not provide any sequence type object, so the developer was forced to use the same technique as you have adopted. During development and initial testing, everything worked fine. It was not until a real production environment the problems began to show.

In essence, the problem is still that of atomic transacitons. Individual parts of using the max() approach may be atomic, but you need to do at least two steps in any solution which uses this approach and these two steps when combined do not represent an atomic transaction. If the transaction, from start to finish is not atomic, then all bets are off.

>> Most commercial (and even most free) DBMS have some facility for
>> generating id numbers which are guaranteed to be unique
>> (e.g. sequence). If your system has such a facility, use it rather
>> than the max(some_column)+1 approach. Apart from guaranteeing your
>> id numbers will be unique, it will probably be faster than doing a
>> max() lookup followed by an insert.

 Alan> I'm going to put more thought into how I generate my surrogate
 Alan> keys. You are not the only one to raise and eye-brow to SELECT
 Alan> MAX. I chose this over sequences since a sequence was one more
 Alan> database object to maintain. This is an area where I have
 Alan> research to do.

Trust me on this one. If your database provides some sort of object for creating unique values in an atomic manner, use it. The maintenance overhead will be far less than dealing with problems due to transactions which attempt to use an existing unique value. From your clients perspective, it will impress them a lot less if they need to keep contacting you to ask why once or twice a week they are unable to insert new records etc because inserts attempt to use a non-unique value as the primary key.

Tim

-- 
Tim Cross
The e-mail address on this message is FALSE (obviously!). My real e-mail is
to a company in Australia called rapttech and my login is tcross - if you 
really need to send mail, you should be able to work it out!
Received on Sun Jan 19 2003 - 00:38:04 CET

Original text of this message