RE: Failover testing with 10g RAC
Date: Fri, 30 May 2008 08:57:20 -0700
I don't know how or when the crs decides it is going to reboot the node but if you kill the crsd.bin process the node will reboot. That is part of it's job I think.
Univ. of California at Davis
IET Campus Data Center
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bradd Piontek Sent: Friday, May 30, 2008 8:49 AM
Subject: Re: Failover testing with 10g RAC
Are the pieces you are failing redundant in nature? For example, multiple HBAs, switches etc? We had some issues in our fail-over testing that had to do with Service Processor fail-over and it was due to a Linux kernel issue and nmi watchdog processes (again, this was on linux). Without redundancy in the components you mentioned, I would expect CRS to reboot the node. What are you using for OCR and Voting Disk?
Twitter: http://www.twitter.com/piontekdd Oracle Blog: http://piontekdd.blogspot.com Linked In: http://www.linkedin.com/in/piontekdd Last.fm: http://www.last.fm/user/piontekdd/
On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas <jeffthomas24_at_gmail.com> wrote:
Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.
We've been conducting failover testing -- disabling a HBA port, power
off a switch,
yank an IC link, etc.
In every single case, CRS rebooted the server where the dire deed was
and when the server came back up, the repair was successful, e.g. failed over to
the secondary HBA port, or the physical IP for the IPMP group floated to the standby
NIC and so forth.
The other server stayed up and all Oracle components remained
the switch power off test, the physical IP for the IC actually floated over to the
standby NIC with no outage on this server.
Is this what is to be expected? CRS will always reboot a server to
itself when an underlying hardware failure is detected?
http://www.freelists.org/webpage/oracle-l Received on Fri May 30 2008 - 10:57:20 CDT