reliable message - bug PMON involved

From: Grzegorz Goryszewski <grzegorzof_at_interia.pl>
Date: Sat, 26 Nov 2011 16:54:59 +0100
Message-ID: <4ED10BD3.2030505_at_interia.pl>



Hi,
 its more for blog post but Im not blogging so maybe share here :) . Looks like we hit (in 10.2.0.3 env) :
*DBMS_SERVER_ALERT.SET_THRESHOLD HANGS FOREVER AT RELIABLE MESSAGE [ID
794589.1]

looks not bad (relaible message is idle wait right ?) but when I've tried to deal with hanging processes via kill -9 processes are no longer on os pid lists but
from Oracle point of view we still got sessions for that ospids and PMON is unable to proper clear that session . From PMON trace:
**** 2011-11-25 13:44:35.047

found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
*** 2011-11-25 13:44:45.060

found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
*** 2011-11-25 13:44:47.064

found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead

in alert log PMON is unable to clean up process bla bla . After restarting EM grid agents there are two new hanging processes on dbms_server_alert.set_threshold still reliable message . When You strace that proces You can see
 strace -p 30540
Process 30540 attached - interrupt to quit semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 352946}, ru_stime={0, 95985}, ...}) = 0 getrusage(RUSAGE_SELF, {ru_utime={0, 352946}, ru_stime={0, 95985}, ...}) = 0 semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)

so its timeout on semaphore set operation call . There is SR open but Oracle not responded so far . Dont want to be so dramatic but Im sure shutdown immediate will not help here :) .
Any ideas how to deal with session hanging on that event (reliable message ) ?
Regards
GregG

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Nov 26 2011 - 09:54:59 CST

Original text of this message