Re: Effective deadlock handling

From: Phil Singer <psinger1_at_chartermi.net>
Date: Fri, 02 Jan 2004 23:21:14 -0500
Message-ID: <3FF6433A.4681A19E@chartermi.net>

Saeed wrote:
>
> Hello,
>
> A somewhat perplexing problem that I would like to share, and perhaps in
> doing so someone wiser will spot what I'm doing wrong.
>
> he following description is a made up application to help explain the
> problem, so if it sounds a bit naff, sorry.
>
> An application has a JOB_STATS table as follows:
>
> Column Type Null?
> ------------------- ----------- --------
> ID NUMBER(10) NOT NULL
> PER_ID NUMBER(10) NOT NULL
> JOB_DATE DATE NOT NULL
> NUM_BOOKED NUMBER(4) NOT NULL
> NUM_CANCELLED NUMBER(4) NOT NULL
> NUM_COMPLETED NUMBER(4) NOT NULL
>
> This table is for upto the minute stats that managers can view to see
> how jobs are stacking up for future days, and how the current load and
> backlog is progressing. Actual core data that these stats are based upon
> are maintained elsewhere. Whenever a change in the core data occurs that
> pertains to these stats, this table is updated with appropriate
> triggers. During batch loads in the morning and late evening, this can
> result in hundreds of updates per second from the batch concurrent jobs.
>
> Now, there are occasions when in a single transaction 2 rows may be
> updated, e.g. if PER_ID 10 has cancelled an appointment on 10/12/2003
> and moved it to 11/12/2003, but the auto scheduler has also decided to
> swap jobs for this person, by cancelling the 11/12/2003 and creating one
> on 10/12/2003. So the procedure that would handle this would be called
> by session 1 as:
>
> pa_stats.swap_appt ( p_per_id => 10,
> p_old_date => '10/12/2003',
> p_new_date => '11/12/2003' );
>
> and session 2 would have:
>
> pa_stats.swap_appt ( p_per_id => 10,
> p_old_date => '11/12/2003',
> p_new_date => '10/12/2003' );
>
> The procedure should carry out the following UPDATES:
>
> UPDATE job_stats
> SET NUM_CANCELLED = NUM_CANCELLED - 1
> WHERE per_id = p_per_id
> AND job_date = p_old_date;
>
> UPDATE job_stats
> SET NUM_BOOKED = NUM_BOOKED + 1
> WHERE per_id = p_per_id
> AND job_date = p_new_date;
>
> Now, if the these 2 sessions invoke the procedures as above at the same
> time, we get a deadlock.
>
> To solve this, I have tried a number of methods:
>
> 1. SELECT FOR UPDATE
>
> CURSOR get_lock ( cp_per_id NUMBER,
> cp_old_date DATE,
> cp_new_date DATE ) IS
> SELECT id,
> num_cancelled,
> num_booked
> FROM job_stats
> WHERE per_id = cp_per_id
> AND ( job_date = cp_old_date OR job_date = cp_new_date )
> FOR UPDATE OF num_cancelled, num_booked;
>
> This helps but does not eliminate the deadlocks, especially during batch
> updates.
>
> 2. SELECT FOR UPDATE NOWAIT
>
> As above but with a NOWAIT.
> With this we obviously avoid the deadlock, but the problem then is how
> to deal the "failed" transaction. Retries seem to have limited success.
> The only reliable option appears to be to write the data to an
> intermediate table and then another job would complete the UPDATEs from
> the intermediate data. This is messy, and renders the stats near-time
> rather than real-time.
>
> 3. Exert Some Order in the Locking
>
> We replace the cursor above with:
>
> CURSOR get_lock ( cp_per_id NUMBER,
> cp_date DATE ) IS
> SELECT id,
> num_cancelled,
> num_booked
> FROM job_stats
> WHERE per_id = cp_per_id
> AND job_date = cp_date
> FOR UPDATE OF num_cancelled, num_booked;
>
> Then in the code we try to ensure that the order of the locks is
> consistent:
>
> IF p_old_date < p_new_date THEN
> OPEN get_lock ( p_per_id, p_old_date );
> CLOSE get_lock;
> OPEN get_lock ( p_per_id, p_new_date );
> CLOSE get_lock;
> ELSE
> OPEN get_lock ( p_per_id, p_new_date );
> CLOSE get_lock;
> OPEN get_lock ( p_per_id, p_old_date );
> CLOSE get_lock;
> END IF;
>
> Now this should, to my mind, prevent deadlocks since we know that the
> locks will be obtained in a consistent ordered manner. To my surprise,
> this still did not prevent the deadlocks. I'm surprised because I cannot
> see how a deadlock can occur with this code, even if we have dozens of
> sessions with varying values (maybe three-way or 4-way contention) I
> cant see how with this scheme the deadlock can arise.
>
> Can anyone explain this to me?
>
> Kind regards,
>
> Saeed
>

Daniel is correct, that Oracle rarely puts you into a deadlock (the common exception being an unindexed foreign key).

And Galen is correct, that applications can still put themselves into a deadlock.

Looking at your Case 3: your pseudocode is indicating a lock, but is not indicating what it plans to do with this lock. Now, what normally gets someone in trouble is the following:

User 1 locks a row.
User 2 comes along and locks another row.
User 1 goes and tries to update user 2's row.
User 2 won't let that row go until (s)he updates User 1's row.

etc.

In your case 3, can different physical users do things under the same p_per_id? If so, I can imagine getting into the above sort of deadlock.

Is this a web application? Could the user be double_clicking and sending you a pair of requests for the same dude?

Finally, are you simply updating, or are you doing some complex kind of delete and insert? If so, it is possible that what you want to eventually insert may turn out to be a row blocked by somebody else.

I do hope one of these ramblings turns out to be helpful. I know that this sort of problem is a real bear to resolve.

-- 
Phil Singer                |    psinger1ATchartermiDOTnet
Oracle DBA

Received on Fri Jan 02 2004 - 22:21:14 CST