RE: Failover testing with 10g RAC
Date: Fri, 30 May 2008 08:57:20 -0700
Message-ID: <FE043305B38A0F448F3924429D650C2A07DE485F@VEXBE2.ex.ad3.ucdavis.edu>
Greetings,
I don't know how or when the crs decides it is going to reboot the node but if you kill the crsd.bin process the node will reboot. That is part of it's job I think.
Bill Wagman
Univ. of California at Davis
IET Campus Data Center
wjwagman_at_ucdavis.edu
(530) 754-6208
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bradd Piontek
Sent: Friday, May 30, 2008 8:49 AM
To: jeffthomas24_at_gmail.com
Cc: oracle-l
Subject: Re: Failover testing with 10g RAC
Jeff,
Are the pieces you are failing redundant in nature? For example,
multiple HBAs, switches etc? We had some issues in our fail-over testing
that had to do with Service Processor fail-over and it was due to a
Linux kernel issue and nmi watchdog processes (again, this was on
linux). Without redundancy in the components you mentioned, I would
expect CRS to reboot the node. What are you using for OCR and Voting
Disk?
--
Bradd Piontek
Twitter: http://www.twitter.com/piontekdd
Oracle Blog: http://piontekdd.blogspot.com
Linked In: http://www.linkedin.com/in/piontekdd
Last.fm: http://www.last.fm/user/piontekdd/
On Fri, May 30, 2008 at 10:21 AM, Jeffery Thomas <jeffthomas24_at_gmail.com> wrote:
Solaris 10, RAC 10.2.0.3. Using IPMP groups for NIC redundancy.
We've been conducting failover testing -- disabling a HBA port, power
off a switch,
yank an IC link, etc.
In every single case, CRS rebooted the server where the dire deed was
performed,
and when the server came back up, the repair was successful, e.g. failed
over to
the secondary HBA port, or the physical IP for the IPMP group floated
to the standby
NIC and so forth.
The other server stayed up and all Oracle components remained
available. During
the switch power off test, the physical IP for the IC actually
floated over to the
standby NIC with no outage on this server.
Is this what is to be expected? CRS will always reboot a server to
repair
itself when an underlying hardware failure is detected?
Thanks,
Jeff
--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
Received on Fri May 30 2008 - 10:57:20 CDT