Re: Failover testing with 10g RAC

From: Bradd Piontek
Date: Fri, 30 May 2008 10:48:35 -0500
Message-ID: <>

  Are the pieces you are failing redundant in nature? For example, multiple HBAs, switches etc? We had some issues in our fail-over testing that had to do with Service Processor fail-over and it was due to a Linux kernel issue and nmi watchdog processes (again, this was on linux). Without redundancy in the components you mentioned, I would expect CRS to reboot the node. What are you using for OCR and Voting Disk?

Bradd Piontek
Oracle Blog:
Linked In:

On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas

> Solaris 10, RAC Using IPMP groups for NIC redundancy.
> We've been conducting failover testing -- disabling a HBA port, power
> off a switch,
> yank an IC link, etc.
> In every single case, CRS rebooted the server where the dire deed was
> performed,
> and when the server came back up, the repair was successful, e.g. failed
> over to
> the secondary HBA port, or the physical IP for the IPMP group floated
> to the standby
> NIC and so forth.
> The other server stayed up and all Oracle components remained
> available. During
> the switch power off test, the physical IP for the IC actually
> floated over to the
> standby NIC with no outage on this server.
> Is this what is to be expected? CRS will always reboot a server to repair
> itself when an underlying hardware failure is detected?
Thanks,
Jeff
> --
Received on Fri May 30 2008 - 10:48:35 CDT

