Re: puzzling deadlock

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 29 Sep 2003 21:08:04 +0100
Message-ID: <bla3dq$c3n$1$8302bc10_at_news.demon.co.uk>


Notes in-line

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html
____Belgium__November (EOUG event)
____UK_______December (UKOUG conference)


Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____USA__October
____UK___November


The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Dave Hau" <davehau-no-spam-123_at_no-spam.netscape.net> wrote in message
news:RJKdb.10214$MX.7624_at_newssvr25.news.prodigy.com...

> Can you show me how to get the 4-way deadlock using indexed access
paths,
> and why the OP's code looks as if it has to be doing a tablescan?
>
4-way deadlock is easy: Session1 tries to delete rows A, B, C, D in that order, but we slow the deletes down by making sure that there are plenty of other rows to delete in between Session 2 tries to delete rows B, C, D, A in that order. Session 3 tries to delete rows C, D, A, B in that order Session 4 tries to delete rows D, A, B, C in that order. This can be done trivally by having lots of data and 4 different indexes to get the ordering. Arrange the sequence so that session 4 locks D session 3 locks C session 2 locks B session 1 locks A Session 4 now tries to lock A but can't. Session 3 tries to lock D but can't Session 2 tries to lock C but can't Session 1 tries to lock B but can't. QED I assumed it was a tablescan because there is a two-column table, and one of the columns has a name that looks like an ID which might be indexed and the other looks like a fact which probably isn't. The query's WHERE clause looks as if it applies to the non-indexed column, and there are no group or order by clauses. On the other hand, an index-full-scan could be viable if the table had virtually no data and a high value for its HWM. This is a surmise based on word-games, not a scientific deduction based on visible numbers.
> I guess the most interesting aspect of this deadlock is that the
four
> sessions are deadlocked on exactly the _same_ delete statement, with
each
> session holding a row and waiting for a row in exactly the _same_
block.
>
> From my understanding of how a delete statement is processed, Oracle
should
> process the implicit query of selecting all rows where numlogins <=0
first,
> then go through the resulting row set and delete the rows one by
one. It doesn't actually do this - Oracle starts by being optimistic, and deleting rows as it finds them. Only if it hits a conflicting committed change will it rollback the update and restart it by trying a 'select for update'. This COULD be why 4 session had the problem - they have collided because they all updated and committed, then tried the delete. Given enough retries and highly concurrent activity, they could lock up, but I can't emulate it on demand.
> What
> I don't understand is why the row order in the resulting row set is
> different for each of the four sessions, thus resulting in the 4-way
> deadlock. One possibility is that the four sessions use different
query
> plans to process the implicit queries, but why?
>
> Another possibility I thought of was parallelized DML, since Oracle
9.2 does
> support intra-partition parallelized deletes. But the granularity
of
> parallelization is block. Rows within a single block cannot be
assigned to
> multiple threads. Since this deadlock involves only rows within the
same
> block, this cannot be the reason for the deadlock.
>
> Also, it's not clear why the deadlock only happens on one Oracle
instance
> and cannot be reproduced on other instances.
>
> You also asked whether connections are being made and broken
extremely
> rapidly. How might this be relevant?
>
The text of the query is counting numlogins - if connections are being made and broken very quickly then is the numlogins column being updated very frequently ? If so would this encourage the appearance of this problem.
> Regards,
> Dave
>
>
Received on Mon Sep 29 2003 - 22:08:04 CEST

Original text of this message