Re: Help finding natural keys

From: Alan Gutierrez <ajglist_at_izzy.net>
Date: Sun, 19 Jan 2003 00:32:17 GMT
Message-ID: <lmmW9.13588$I55.454094_at_news2.east.cox.net>


Tim Cross wrote:

>>>>>> "Alan" == Alan Gutierrez <ajglist_at_izzy.net> writes:

> Alan> ISSUE 2
>
> >> What I found of far more concern was the way you generate your
> >> surrogate keys. Determineing a new primary key by looking at the
> >> largest current primary key and adding one to it is very
> >> dangerous. The big danger is that first selecting the maximum
> >> value from a field, adding some amount to it and then inserting
> >> the resulting key back into the table is not an atomic action.
>
> Alan> INSERT INTO Patient (firm_id, patient_id) VALUES (:firm_id, 0)
> Alan> ;
>
> Alan> The above is atomic in PostgreSQL. The first transaction to
> Alan> execute this statement will block any other transaction
> Alan> attempting to execute this statement until this first
> Alan> transactions rolls back or commits.
>
> Alan> UPDATE Patient
> Alan> SET patient_id = (SELECT MAX(patient_id)
> Alan> FROM Patient AS P1
> Alan> WHERE P1.firm_id = firm_id)
> Alan> WHERE firm_id = :firm_id AND patient_id = 0
> Alan> ;
>
> Alan> Voila.

> The actual insert/update operations might be atomic in themselves, but
> not between each other. For example, in your above example, what would
> happen if another process has inserted a new record after you inserted
> a new record, but before you issue the update command to set the
> patient id?

In PostgreSQL the second process would wait until the first process ended it's transaction.

> My interpretation of what you have above would indicate the
> possibility both records would get the same patient id - the
> possibility might be slight because of the firm_id part in the where
> clause and it is 'unlikely' two records from the same firm will be
> inserted and waiting for update at the same time. However 'unlikely'
> is not never.

I understand race conditions. I have tested this sequence of statments to ensure that with concurrent transactions, the first one to preform the insert statement will force other transactions to wait until transaction end to resume execution. This method for generating unique ids does work.

> >> Most commercial (and even most free) DBMS have some facility for
> >> generating id numbers which are guaranteed to be unique (e.g.
> >> sequence). If your system has such a facility, use it rather than
> >> the max(some_column)+1 approach. Apart from guaranteeing your id
> >> numbers will be unique, it will probably be faster than doing a
> >> max() lookup followed by an insert.
>
> Alan> I'm going to put more thought into how I generate my surrogate
> Alan> keys. You are not the only one to raise and eye-brow to SELECT
> Alan> MAX. I chose this over sequences since a sequence was one more
> Alan> database object to maintain. This is an area where I have Alan>
> research to do.
>
> Trust me on this one. If your database provides some sort of object
> for creating unique values in an atomic manner, use it. The
> maintenance overhead will be far less than dealing with problems due
> to transactions which attempt to use an existing unique value. From
> your clients perspective, it will impress them a lot less if they need
> to keep contacting you to ask why once or twice a week they are unable
> to insert new records etc because inserts attempt to use a non-unique
> value as the primary key.

The maintianence overhead of sequences still exists, and the problem of generating duplicate keys has been resolved. Are sequences still the better bet?

Thank you for your concern. Data races and deadlock are a frequently a source of trouble for those new to multi-process environments. They still trip me up on (not-so-rare) occasion. I appreciate all the frantic hand waving. I'll post a sanity check over at a PostgreSQL forum. But this does work.

Alan Gutierrez - ajglist_at_izzy.net Received on Sun Jan 19 2003 - 01:32:17 CET

Original text of this message