RE: RAC Full cluster outage (almos)

From: Crisler, Jon <Jon.Crisler_at_usi.com>
Date: Wed, 11 Mar 2009 13:48:54 -0400
Message-ID: <56211FD5795F8346A0719FEBC0DB067503F33D5D_at_mds3aex08.USIEXCHANGE.COM>



I have seen this problem occur with Linux, due to a problem with GLIBC versions. I don't know if this happens under Solaris, but check Metalink for RAC issues with GLIBC. The fix is to install newer versions of GLIBC and relink.  

From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of LS Cheng Sent: Wednesday, March 11, 2009 11:36 AM To: Oracle-L
Subject: RAC Full cluster outage (almos)  

Hi

A couple of days one of my customers faced a almost full cluster outage in a 2 node 10.2.0.4 RAC on Sun Solaris 10 Sparc (full oracle stack).

The sequence was as follows

  1. node 2 lost private network, interface went down
  2. node 1 evicts noe 2 (as expected)
  3. node 1 then evicts himself
  4. after nodes 1 returned to the cluster and cluster reformed from 1 node to two nodes, node 2 lost private network again and this time eviction occurs in node 2

So it was not really a full cluster outage but the eviction occured one after another so it looked full outage to the users.

My doubt is, in a nodes cluster node 1 always survives which is not in this case. My only theory is node 2 was so ill that it could not reboot the server, node 1 then evicts himself to avoid corruptions.

Any more ideas?

Cheers

--

LSC
--

http://www.freelists.org/webpage/oracle-l Received on Wed Mar 11 2009 - 12:48:54 CDT

Original text of this message