blocked sessions within ASM

From: Adric Norris <landstander668_at_gmail.com>
Date: Wed, 23 Sep 2015 16:35:13 -0500
Message-ID: <CAJueESqu7U66tpS-L9_J6uJjCU1X-SpFs9MftYVZER3=Kc5_Lg_at_mail.gmail.com>



We're encountering a weird issue with blocking/blocked sessions within ASM on a 3-node cluster, which causes queries against some of the fixed views (v$asm_diskgroup being a prime example) to hang indefinitely. We've opened a SR and are actively working the issue with Oracle, but don't seem to be making much progress at the moment. :( The versions in play are 11.2.0.4.6 64-bit for CRS/ASM and most databases, along with a pair of stragglers running 11.2.0.3.13 64-bit, all on a trio of Solaris 10 SPARC 64-bit servers.

Here's an example of the blocking/blocked session info we see within ASM, which has been virtually identical for all occurrences.

[SYS_at_node1-vip/+ASM AS SYSDBA] SQL> select s1.inst_id, s1.sid, s1.state, s1.event, s1.seconds_in_wait,

  2         s1.blocking_instance, s1.blocking_session
  3     from gv$session s1
  4     where s1.blocking_session is not null
  5        or ( s1.inst_id, s1.sid ) in ( select s2.blocking_instance,
s2.blocking_session
  6                                          from gv$session s2
  7                                          where s2.blocking_session is
not null
  8                                     )
  9     order by s1.sid;

   INST_ID    SID STATE               EVENT
SECONDS_IN_WAIT BLOCKING_INSTANCE BLOCKING_SESSION
---------- ------ ------------------- -------------------------
--------------- ----------------- ----------------
         1      4 WAITING             enq: DD - contention
 47162                 1              446
         1     37 WAITING             enq: DD - contention
161507                 1              446
         1     38 WAITING             enq: DD - contention
 11963                 1              446
         1     73 WAITING             enq: DD - contention
6694                 1              446
         1    107 WAITING             enq: DD - contention
 31266                 1              446
         1    140 WAITING             enq: DD - contention
 55280                 1              446
         1    173 WAITING             enq: DD - contention
7511                 1              446
         1    242 WAITING             enq: DD - contention
 29064                 1              446
         1    274 WAITING             enq: DD - contention
3080                 1              446
         1    446 WAITING             rdbms ipc reply
  2                 1              613
         1    613 WAITED SHORT TIME   CSS operation: action
 421608
         1    616 WAITING             enq: DD - contention
 14797                 1              446
         1    649 WAITING             enq: DD - contention
 45710                 1              446
         1    685 WAITING             enq: DD - contention
4952                 1              446
         1    717 WAITING             enq: DD - contention
1511                 1              446
         1    752 WAITING             enq: DD - contention
211429                 1              446
         1    786 WAITING             enq: DD - contention
120149                 1              446
         1    818 WAITING             enq: DD - contention
 29457                 1              446
         1    854 WAITING             enq: DD - contention
177882                 1              446
         1    888 WAITING             enq: DD - contention
 25441                 1              446
         1    922 WAITING             enq: DD - contention
 35106                 1              446
         1    954 WAITING             enq: DD - contention
 38162                 1              446
         1    956 WAITING             enq: DD - contention
109381                 1              446
         1    990 WAITING             enq: DD - contention
6053                 1              446
         1   1024 WAITING             enq: DD - contention
 28791                 1              446
         1   1058 WAITING             enq: DD - contention
6014                 1              446

26 rows selected.

The root blocker (SID 613) turns out to be the RBAL process, which isn't really active... no rebalance is currently underway, process state always shows as SLEEP at the OS level, and the dia0 tracefile confirms that it's not in a wait. The mid-level blocker is SID 446, which is a connection from one of the local 11.2.0.4 database instances. Everything else also corresponds to local DB connections, and are waiting on SID 446.

At this point, I'm pretty sure that all of the blocked sessions were initiated by one of our DB monitoring jobs (which has been running without issue for years)... it runs one query on each database to check FRA usage, and then follows with a query against v$asm_diskgroup to determine available space for the underlying diskgroup. Once the issue occurs all queries against v$asm_diskgroup (and also v$asm_disk, IIRC) will hang indefinitely on the affected ASM instance, regardless of whether they're issued from a database or directly against ASM.

To date, the only known method to clear this blockage is to shutdown and restart ASM on the affected node. To make matters worse, however, it also prevents mounting of any diskgroup clusterwide... this will prevent the cluster stack from starting, so when multiple instances are impacted they must *all* be shutdown before any node can rejoin the cluster. Fortunately we've never seen this strike all ASM instances simultaneously (yet), but it has turned up in 2 out of 3 (with no reason to suspect the last one is immune).

Does this sound familiar to anyone? Suggestions on how to troubleshoot it further would be greatly appreciated.

-- 
"In the beginning the Universe was created. This has made a lot of people
very angry and been widely regarded as a bad move." -Douglas Adams

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Sep 23 2015 - 23:35:13 CEST

Original text of this message