Re: deadlock trace file in weblogic environment
Date: Fri, 13 Feb 2009 22:02:31 -0800 (PST)
You're right, Riyaj. I googled and found a few wait-for-graphs that have mode 3 or 4, although most are 5.
I think your graph can be interpreted sequentially along the lines: process with transaction ID's (whatever it is) [65662,2751] is blocked by process [65657,2197], which is blocked by the next one, which is blocked by next, ... and it wraps back up to [65662,2751]. I don't know what they're doing, and if all of them are on the same instance as in your case, you probably don't even see the SQL involved. Oracle definitely needs to improve the trace file to at least the same as a non-RAC deadlock trace.
Thanks for the info about _lm_dd_interval. I noticed 11g shortens the default value of it from 60 to 10, and added a few new _lm_dd% params.
- On Fri, 2/13/09, Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com> wrote:
> From: Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com>
> Subject: Re: deadlock trace file in weblogic environment
> To: yong321_at_yahoo.com
> Cc: oracle-l_at_freelists.org
> Date: Friday, February 13, 2009, 3:35 PM
> Hi Huang
> I have seen different modes in RAC instances. Here is an
> example from 126.96.36.199 instance.
> Global Wait-For-Graph(WFG) at ddTS[0.44] :
> BLOCKED 8e919f168 3 [0x178001f][0x7789a],[TX] [65662,2751] 0
> BLOCKER 8e2aa44c8 3 [0x178001f][0x7789a],[TX] [65657,2197] 0
> BLOCKED 8e2aa4bd0 3 [0x70023][0x2f68bc],[TX] [65657,2197] 0
> BLOCKER 8e558a3c8 3 [0x70023][0x2f68bc],[TX] [65666,213] 0
> BLOCKED 8e558aad0 3 [0x179000f][0x89000],[TX] [65666,213] 0
> BLOCKER 8e819e070 3 [0x179000f][0x89000],[TX] [65707,152] 0
> BLOCKED 8e819e1d8 3 [0x20001][0x4955c6],[TX] [65707,152] 0
> BLOCKER 8e919f2d0 3 [0x20001][0x4955c6],[TX] [65662,2751] 0
> Capturing deadlock information is not so easy, especially,
> if the problem
> involves multiple instances. Not much information is
> printed in deadlock
> trace files (I think, 11g improves this, but haven't
> tested this thoroughly
> yet). Of course, to find why there is a deadlock need more
> information about
> blockers and blocked etc. View gv$lock is useless.
> gv$ges_enqueue can provide more information. But, it is
> slow since
> gv$ges_enqueue encompasses BL locks (Buffer cache locks)
> also, which tend to
> be numerous on huge SGAs. Not to mention the fact that,
> deadlock detection
> algorithm (controlled by _lm_dd_interval) is smaller than
> query response time (at least, in our case). So, I ended up
> writing a small
> script to debug a deadlock we recently encountered.
> Hrishy, you can find script rac_check_lock.ksh and
> rac_check_lock.sql here.
> Read and execute shell script(which will call sql script)
> to capture state
> of blockers and waiters in each node. This should give you
> more information
> to debug deadlocks. Please send output, if I can be of any
> orainternals_scripts_rac </a>
> Riyaj Shamsudeen
> Principal DBA,
> Ora!nternals - http://www.orainternals.com
> Specialists in Performance, Recovery and EBS11i
> Blog: http://orainternals.wordpress.com
> On Fri, Feb 13, 2009 at 12:34 PM, Yong Huang
> <yong321_at_yahoo.com> wrote:
> > I find that at least the cvt | held mode part is not
> right; it's always 5.
> > For instance,
> > Global Wait-For-Graph(WFG) at ddTS[0.5] :
> > BLOCKED 0xd933e9a0 5 wq 2 cvtops x1 [0x70011][0x5c8f3],[TX] [34000-0001-0000522D] 0
> > BLOCKER 0xd933e850 5 wq 1 cvtops x8 [0x70011][0x5c8f3],[TX] [5C000-0001-00000575] 0
> > BLOCKED 0xd933ec58 5 wq 2 cvtops x1 [0x180017][0x3935f],[TX] [5C000-0001-00000575] 0
> > BLOCKER 0xd933eaf0 5 wq 1 cvtops x8 [0x180017][0x3935f],[TX] [34000-0001-0000522D] 0
> > It was the simplest ORA-60 deadlock created on RAC (Oracle 10.2.0.4). Both
> > sessions were in the first instance. Of all RAC deadlock graphs I've seen,
> > mode is always 5. But other pieces of info may be correct.