Re: Help: Why 9i RAC always reboot?

From: David Fitzjarrell <oratune_at_msn.com>
Date: 13 Sep 2001 07:14:35 -0700
Message-ID: <32d39fb1.0109130614.574226a1@posting.google.com>

It appears you have an unsupported configuration as 9i for Linux is only certified for SuSe7.1, Kernel 2.4.4 and the operating system library GNU Lib C 2.2 only. It is, therefore, not surprising that some of the more advanced features of 9i fail to operate as expected, one of those being RAC.

I notice you have been having this problem for some time, and even had Howard responding to you in a similar vein. I doubt that you'll find a resolution to your difficulties, especially since you have chosen to install 9i on a platform that is, as of now, unsupported by Oracle. Different vendors have different kernels (obvious to most but necessary to state) and you've probably run into a situation where a call in SuSe7.1, Kernel 2.4.4 works properly but fails in other Linux kernels.

Another thought is that this could be an issue of available memory (although that, in my mind, is quite a reach); how much RAM do you have installed in your RedHat server? Possibly an increase of available resource in this area may resolve this issue, and I stress the word 'MAY'.

I would heed the notice by Oracle as to the usability of the port of 9i to Linux. It is specific to SuSe7.1, Kernel 2.4.4 with glibc 2.2. And, believe me, all Linux releases ARE NOT alike.

David Fitzjarrell

u518615722_at_spawnkill.ip-mobilphone.net (young dba) wrote in message news:<l.1000322055.1123565673@[64.94.198.252]>...
> Our machine 9i (RAC) on linux 7.1(RedHat) always reboot on itself, with
> the following errors:
> alert.log:
> Sat Sep 8 23:43:14 2001
> Errors in file /opt/oracle/product/9.0.1/admin/clustdb/bdump/lmon_1524.trc:
> ORA-29702: error occurred in Cluster Group Service operation
> Sat Sep 8 23:43:14 2001
> LMON: terminating instance due to error 29702
>
> lmon_1524.trc is as follows:
>
> kjxgmps: proposing substate 4
> kjxgmcs: Setting state to 3 4.
> Name Service recovery started
> Deleted all non-local name entries
> kjxgmps: proposing substate 5
> kjxgmcs: Setting state to 3 5.
> Broadcasted all local name entries for publish
> Replayed all pending requests
> kjxgmps: proposing substate 6
> kjxgmcs: Setting state to 3 6.
> Name Service normal
> Name Service recovery done
> *** 2001-09-08 23:39:32.090
> kjxgmps: proposing substate 7
> kjxgmcs: Setting state to 3 7.
> kjfmact: call ksimdic on instance (0)
> *** 2001-09-08 23:39:32.100
> *** 2001-09-08 23:39:32.100
> Reconfiguration started
> Synchronization timeout interval: 600 sec
> List of nodes: 1,
> Global Resource Directory frozen
> node 1
> * kjshashcfg: I'm the only node in the cluster (node 1)
> Active Sendback Threshold = 100 %
> Communication channels reestablished
> Server queues filtered
> Master broadcasted resource hash value bitmaps
> Non-local Process blocks cleaned out
> Resources and enqueues cleaned out
> Resources remastered 1043
> 1574 GCS shadows traversed, 0 cancelled, 87 closed
> 5155 GCS resources traversed, 0 cancelled
> 12290 GCS resources on freelist, 12922 on array, 12922 allocated
> set master node info
> 1574 GCS shadows traversed, 0 replayed, 87 unopened
> Submitted all remote-enqueue requests
> Update rdomain variables
> 0 write requests issued in 1487 GCS resources
> 0 PIs marked suspect, 0 flush PI msgs
> Dwn-cvts replayed, VALBLKs dubious
> All grantable enqueues granted
> *** 2001-09-08 23:39:32.371
> Reconfiguration complete
> *** 2001-09-08 23:43:14.562
> CM:CMRPC: write error occurred: errno=32, rcWrite=-1, bytesWritten=-1
> kjxggpoll: received an error event from DBALL_DB
> Return code from kjxggpoll: 10
> error 29702 detected in background process
> ORA-29702: error occurred in Cluster Group Service operation
> ksuitm: waiting for [5] seconds before killing DIAG
>
>
> Could somebody help me out?
>
> Thanks
Received on Thu Sep 13 2001 - 09:14:35 CDT