Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: RAC on OCFS2 acceptance testing

RE: RAC on OCFS2 acceptance testing

From: Kevin Closson <kevinc_at_polyserve.com>
Date: Thu, 28 Dec 2006 08:55:27 -0800
Message-ID: <5D2570CAFC98974F9B6A759D1C74BAD001C33885@ex2.ms.polyserve.com>

 >>>

>>>>>> 4. FO: Cascading failures [?]
>>>>> yes
>>>
>>>Could you elaborate? What kind of realistic cascading
>>>failure scenarios would you recommend?

tail the ocssd logs and as CRS is dealing with one failure, manually inject another on a different node. Take you pick. For instance, inject loss of connectivity path wait until CRS is dealing with that and then sever the interconnect from the server that is becoming the CRS master and so on. Remember, CRS is a master-slave architecture. Be creative. Be ugly. Save yourself future headache by finding issues now.

>>>
>>>---
>>>
>>>Given limited timeframe I'll stick to just the functional
>>>tests, but I can submit a proposal

...usually the case. Like I routinely point out, there are very few Oracle shops with enough manpower to actually do clustered Oracle right. The self-managed database thing and the clustered thing are not complementary really.
>>>
>>>The question that will obviously be asked is how relevant
>>>all these wild scenarios (e.g. "dd(1) loop to /dev/null
>>>using absurdly large values assigned to the ibs argument")
>>>to the application at hand? Fencing, split-brain and other
>>>fascinating problems might be quite real in some cases, but
>>>are they for this specific app?

... the wild scenarios I describe are to simulate the situation servers can get in when things go wrong. Unless you have those application bugs sitting around as stimulus how else to you create a memory starvation issue that can happen with simple bugs like stack recursion. The point is that RAC is supposed to help you survive a node being overloaded to the point of being "ill". Prove it.

>>>Earlier this year, somebody mentioned (if I understood
>>>correctly) that there are problems managing a 2-node RAC
>>>deployed on OCFS2 hosted by SLES9 boxes due to the lack of
>>>quorum and the quality of the OCFS2 code.
>>>What's the likelihood of this happening though?

It is not theoretical. It is fact. Only you can tell us if it will be ok to lose an entire cluster due to OCFS2 split brain just because you lost one of the 2 nodes? Sort of the antithesis of what you paid for isn't it?

>>>BTW, is your "Database Utility for RAC" available only to
>>>the Polyserve customers? Does it work with OCFS2?

It is the PolyServe database product and predates OCFS and is a super, super-set of OCFS so no it doesn't work with it.

There are people who are happy with OCFS. I also think there are a significant number of people that have never been through an Oracle license audit--two points that are joined at the hip.

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Dec 28 2006 - 10:55:27 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US