Question re racgmain processes running amok

From: William Wagman <>
Date: Fri, 21 Mar 2008 13:21:11 -0700
Message-ID: <>


The question pertains to a two node RAC cluster running Oracle SE on 32-bit Linux 2.6.9-67.ELsmp. CRS, ASM & RDBMS are each in a separate home. Yesterday on node 1 I started seeing messages in the /var/log/messages file of the form...

Mar 20 07:5:34 spenser init: Id "h3" respawning too fast: disabled for 5 minutes

We did some looking around to try and determine the cause of this but didn't come up with anything immediately. There were a core dump generated in the $CRS_HOME/log/<node_name>/crsd directory at about the time we noticed this beginning. Various error messages indicating various failures (I can provide a segment) appeared at this time in the crsd.log also. At this point I didn't know what was occurring so opened an SR with Oracle.

This morning, which gathering some additional information I found that on node2 in this cluster there were a large number of racgmain processes running and the number of these processes running was increasing, all the swap space and virtually all of the memory on this node were in use. Some of the processes were running out of the CRS home and some out of the ASM home. I did some investigating to see if it would be possible to stop these processes gracefully and was unable to gather any information. Ultimately we rebooted node2 of the cluster and everything appears to be functioning as is expected at this point.

My question is what would cause the racgmain process to run amok this way. Currently ps -ef|grep racgmain shows none running on either node. I'm puzzled by this and other than information indicating that this process is part of ONS I am not able to find any further information or details. Any suggestions would be greatly appreciated.


Bill Wagman
Univ. of California at Davis
IET Campus Data Center
(530) 754-6208

Received on Fri Mar 21 2008 - 15:21:11 CDT

Original text of this message