Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: RAC unexpected reboot of nodes

Re: RAC unexpected reboot of nodes

From: hpuxrac <johnbhurley_at_sbcglobal.net>
Date: 9 Mar 2006 10:08:38 -0800
Message-ID: <1141927717.966545.67170@p10g2000cwp.googlegroups.com>

alek wrote:
> HI,
>
> I'm a quite new in the RAC field and I want to know if the following
> behavior is normal for such a configuration:
>
> A few weeks ago we succeeded to configure an Oracle 10.2.0.1 cluster.
> The configuration was comprised of 2 nodes and the underlying OS was
> Redhat AS4. The installation went well following all the installation
> steps mentioned into the official oracle documentation. The OCR and the
> voting disks were configured using NFS. At that time we noticed that
> from time to time one of the nodes (not always the same) was
> unexpectedly rebooted. The system or oracle logs didn't offered any
> clues therefore our conclusion was that the NFS might cause problems.
> In order to prove this we decided to configure a RAC on a single node
> just for testing purposes. The OCR, voting disks and the oracle
> software were installed on OCFS2 partitions therefore no NFS was
> involved. On this node we configured 2 oracle instances which worked
> fine for a while but, from time to time or when the server is stressed
> with intensive SQLs the entire server is rebooted. After some searching
> on metalink we found out the Bug.4741921/4556989 (36) INSTANCE
> RESTARTED AFTER SHUTDOWN ABORT IN RAC ENVIRONMENT which is fixed in
> 10.2.0.2 patch. We downloaded and installed the patch but it seems that
> the strange behavior is still there. We notice, indeed, that the
> frequency of the server reboot is lower now but we have no explanation
> for what really causes the reboot.
> Have anyone notice the same behavior on the 10.2.0.x RAC configuration?
> Are there any workarounds for this?

The unexpected re-booting of nodes is unfortunately a pretty common occurrence under linux -- often usually associated with some kind of lockup/access problem against the shared disk systems.

There are some parameters you can attempt to set "higher" to allow the clusterware to tolerate longer periods of ... ( inability to access the storage ).

Jeffrey Hunter in his site www.idevelopment.info has some long writeups on RAC linux configuration that may be a step in the right direction.

Opening a tar with oracle support is for better or worse probably another direction that you need to proceed in. Received on Thu Mar 09 2006 - 12:08:38 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US