Sessions wait on "resmgr:cpu quantum" with little CPU load

From: Yong Huang <yong321_at_yahoo.com>
Date: Mon, 14 Jul 2008 12:57:44 -0700 (PDT)
Message-ID: <250584.74100.qm@web80602.mail.mud.yahoo.com>


Environment: 4-node RAC. A user reported application slowdown that happened last Friday. I pulled out my "perfmon" log (log of top correlated with session wait events). For about 20 minute, about 100 sessions on the 3rd node were waiting on "resmgr:cpu quantum" event, with p1 mostly being 1 or 2, occasionally 3 or 4. The top output captured on node 3 is below (head and sed are from my monitoring script):

top - 16:09:02 up 92 days, 10:03, 0 users, load average: 0.26, 1.16, 1.65 Tasks: 340 total, 1 running, 339 sleeping, 0 stopped, 0 zombie Cpu(s): 17.8% us, 6.6% sy, 0.0% ni, 71.1% id, 4.0% wa, 0.1% hi, 0.3% si Mem: 8293488k total, 8158512k used, 134976k free, 298864k buffers Swap: 8388576k total, 940624k used, 7447952k free, 4252272k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

11434 oracle    -2   0 5241m 2.1g 2.1g S    1 26.9 122:53.97 ora_lms6_riscp13
  424 oracle    19   0 53216 1344 1136 S    0  0.0   0:00.00 head -15
  425 oracle    18   0 53408 1616 1392 S    0  0.0   0:00.00 sed s/  *$//
  451 root      18   0  4176 1424 1120 S    0  0.0   0:00.00 /bin/sleep 1
11410 oracle    -2   0 5241m 2.2g 2.2g S    0 27.6 309:45.19 ora_lms0_riscp13
11414 oracle    -2   0 5241m 2.1g 2.1g S    0 26.9 120:41.45 ora_lms1_riscp13
11418 oracle    -2   0 5241m 2.1g 2.1g S    0 26.9 119:07.49 ora_lms2_riscp13

The DB sessions were from different applications. SQL IDs vary. No new datafile was added during that period (so it's not Bug 4602661).

Other nodes sometimes had high CPU and was always caused by "/home/oracle/product/10.2.0/crs/bin/crs_stat.bin -t", not by any DB server process (i.e. "oracleSID (LOCAL=...").

Nothing interesting is recorded in alert.log or /var/log/messages on any node. Resource manager was set up on this database a long time ago. It doesn't seem to limit what I want it to (the mystic crs_stat.bin, probably from emagent or some monitoring script) and limits what it should not. Any insight is appreciated.

Yong Huang

$ uname -a
Linux xxx03p 2.6.9-55.EL #1 SMP Fri Apr 20 16:30:19 EDT 2007 ia64 ia64 ia64 GNU/Linux

SQL> select * from v$version where rownum = 1; Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bi       

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 14 2008 - 14:57:44 CDT

Original text of this message