Re: deadlock trace file in weblogic environment

From: Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com>
Date: Fri, 13 Feb 2009 15:35:27 -0600
Message-ID: <203315c10902131335n1569d03ajf2d4d5b4c82f111b_at_mail.gmail.com>



Hi Huang

   I have seen different modes in RAC instances. Here is an example from 9.2.0.8 instance.

Global Wait-For-Graph(WFG) at ddTS[0.44] :

BLOCKED 8e919f168 3 [0x178001f][0x7789a],[TX] [65662,2751] 0

BLOCKER 8e2aa44c8 3 [0x178001f][0x7789a],[TX] [65657,2197] 0

BLOCKED 8e2aa4bd0 3 [0x70023][0x2f68bc],[TX] [65657,2197] 0

BLOCKER 8e558a3c8 3 [0x70023][0x2f68bc],[TX] [65666,213] 0

BLOCKED 8e558aad0 3 [0x179000f][0x89000],[TX] [65666,213] 0

BLOCKER 8e819e070 3 [0x179000f][0x89000],[TX] [65707,152] 0

BLOCKED 8e819e1d8 3 [0x20001][0x4955c6],[TX] [65707,152] 0

BLOCKER 8e919f2d0 3 [0x20001][0x4955c6],[TX] [65662,2751] 0

 Capturing deadlock information is not so easy, especially, if the problem involves multiple instances. Not much information is printed in deadlock trace files (I think, 11g improves this, but haven't tested this thoroughly yet). Of course, to find why there is a deadlock need more information about blockers and blocked etc. View gv$lock is useless.

 gv$ges_enqueue can provide more information. But, it is slow since gv$ges_enqueue encompasses BL locks (Buffer cache locks) also, which tend to be numerous on huge SGAs. Not to mention the fact that, deadlock detection algorithm (controlled by _lm_dd_interval) is smaller than gv$ges_enqueue query response time (at least, in our case). So, I ended up writing a small script to debug a deadlock we recently encountered.

 Hrishy, you can find script rac_check_lock.ksh and rac_check_lock.sql here. Read and execute shell script(which will call sql script) to capture state of blockers and waiters in each node. This should give you more information to debug deadlocks. Please send output, if I can be of any help.

    <a href="http://www.orainternals.com/scripts_rac1.php"> orainternals_scripts_rac </a>

Thanks!

-- 
Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals -  http://www.orainternals.com
Specialists in Performance, Recovery and EBS11i
Blog: http://orainternals.wordpress.com


On Fri, Feb 13, 2009 at 12:34 PM, Yong Huang <yong321_at_yahoo.com> wrote:


> I find that at least the cvt | held mode part is not right; it's always 5.
> For instance,
>
> Global Wait-For-Graph(WFG) at ddTS[0.5] :
> BLOCKED 0xd933e9a0 5 wq 2 cvtops x1 [0x70011][0x5c8f3],[TX]
> [34000-0001-0000522D] 0
> BLOCKER 0xd933e850 5 wq 1 cvtops x8 [0x70011][0x5c8f3],[TX]
> [5C000-0001-00000575] 0
> BLOCKED 0xd933ec58 5 wq 2 cvtops x1 [0x180017][0x3935f],[TX]
> [5C000-0001-00000575] 0
> BLOCKER 0xd933eaf0 5 wq 1 cvtops x8 [0x180017][0x3935f],[TX]
> [34000-0001-0000522D] 0
>
> It was the simplest ORA-60 deadlock created on RAC (Oracle 10.2.0.4). Both
> sessions were in the first instance. Of all RAC deadlock graphs I've seen,
> mode is always 5. But other pieces of info may be correct.
>
> Hrishy, are you saying you see a lot of lines in alert.log reporting
> deadlocks but the trace files on filesystem are not found (except one)? If
> they're removed by somebody or DBA's cron job, just read it from /proc (on
> most UNIXes/Linux). For instance, on my Linux box, alert.log says
> /home/oracle/oracle/product/10.2.0/db/admin/riscs1/bdump/riscs11_lmd0_1295.trc
> has the deadlock trace, but ls says No such file or directory. So I did:
>
> $ ls -l /proc/1295/fd | grep riscs11_lmd0_1295.tr
> l-wx------ 1 oracle oinstall 64 Feb 13 11:46 2 ->
> /home/oracle/oracle/archive/riscs1/bdump/20081201/riscs11_lmd0_1295.trc
> $ vi /proc/1295/fd/2
>
> Obviously, if process 1295 exited, you wouldn't have this luck. But my 1295
> is LMD so it won't exit until instance shutdown.
>
> Yong Huang
>
> > The wait-for-graph columns are:
> > <BLOCKED|BLOCKER> <lockp> <cvt|held mode> <res name> <did|pid|txn_id>
> <node>
> >
> >
> > Regards,
> >
> > Vlado Barun, M.Sc.
> > Sr. Manager, Database Engineering and Operations
> > Jewelry Television
> > Mobile: 865 335 7652
> > Email: vlado.barun_at_jtv.com
>
> >> There are 100's of messages in the alert log saying a deadlock occured
> but
> >> there is only one trace file why is this ?
> >> The trace files gets appended with the error messages for the dead lock.
>
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 13 2009 - 15:35:27 CST

Original text of this message