Re: puzzling deadlock

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 29 Sep 2003 21:08:04 +0100
Message-ID: <bla3dq$c3n$1$8302bc10_at_news.demon.co.uk>

Notes in-line

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html
____Belgium__November (EOUG event)
____UK_______December (UKOUG conference)


Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____USA__October
____UK___November


The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Dave Hau" <davehau-no-spam-123_at_no-spam.netscape.net> wrote in message
news:RJKdb.10214$MX.7624_at_newssvr25.news.prodigy.com...


> Can you show me how to get the 4-way deadlock using indexed access


 paths,


> and why the OP's code looks as if it has to be doing a tablescan?


>

4-way deadlock is easy:
    Session1 tries to delete rows A, B, C, D
    in that order, but we slow the deletes down
    by making sure that there are plenty
    of other rows to delete in between

    Session 2 tries to delete rows B, C, D, A
    in that order.

    Session 3 tries to delete rows C, D, A, B
    in that order

    Session 4 tries to delete rows D, A, B, C
    in that order.

    This can be done trivally by having lots of
    data and 4 different indexes to get the ordering.

Arrange the sequence so that
    session 4 locks D
    session 3 locks C
    session 2 locks B
    session 1 locks A

Session 4 now tries to lock A but can't.
Session 3 tries to lock D but can't
Session 2 tries to lock C but can't
Session 1 tries to lock B but can't.

QED


I assumed it was a tablescan because there is
a two-column table, and one of the columns has
a name that looks like an ID which might be indexed
and the other looks like a fact which probably isn't.

The query's WHERE clause looks as if it applies
to the non-indexed column, and there are no group
or order by clauses.  On the other hand, an index-full-scan
could be viable if the table had virtually no data and a high
value for its HWM.

This is a surmise based on word-games, not a scientific
deduction based on visible numbers.



> I guess the most interesting aspect of this deadlock is that the


 four


> sessions are deadlocked on exactly the _same_ delete statement, with


 each


> session holding a row and waiting for a row in exactly the _same_


 block.


>

> From my understanding of how a delete statement is processed, Oracle


 should


> process the implicit query of selecting all rows where numlogins <=0


 first,


> then go through the resulting row set and delete the rows one by


one.

It doesn't actually do this - Oracle starts by being optimistic, and
deleting
rows as it finds them.  Only if it hits a conflicting committed change
will
it rollback the update and restart it by trying a 'select for update'.
This
COULD be why 4 session had the problem - they have collided because
they all updated and committed, then tried the delete.  Given enough
retries and highly concurrent activity, they could lock up, but I
can't
emulate it on demand.




> What

> I don't understand is why the row order in the resulting row set is


> different for each of the four sessions, thus resulting in the 4-way

> deadlock.  One possibility is that the four sessions use different

 query


> plans to process the implicit queries, but why?


>

> Another possibility I thought of was parallelized DML, since Oracle

 9.2 does


> support intra-partition parallelized deletes.  But the granularity


 of


> parallelization is block.  Rows within a single block cannot be


 assigned to


> multiple threads.  Since this deadlock involves only rows within the


 same


> block, this cannot be the reason for the deadlock.


>

> Also, it's not clear why the deadlock only happens on one Oracle

 instance


> and cannot be reproduced on other instances.


>

> You also asked whether connections are being made and broken

 extremely


> rapidly.  How might this be relevant?


>


The text of the query is counting numlogins - if connections are being
made and broken very quickly then is the numlogins column being
updated very frequently ?  If so would this encourage the appearance
of this problem.



> Regards,

> Dave

>

>

Received on Mon Sep 29 2003 - 22:08:04 CEST

This message: [ Message body ]
Next message: Paul Bradford: "Re: selecting ROWNUM causes query to take forever"
Previous message: Dan: "Database Integration Question"
Maybe in reply to: Robert Brown: "puzzling deadlock"
In reply to Dave Hau: "Re: puzzling deadlock"
Next in thread: Paul Bradford: "Re: selecting ROWNUM causes query to take forever"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message