Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Split-brain among HACMP cluster and Oracle9RAC

Re: Split-brain among HACMP cluster and Oracle9RAC

From: tawright915 <tawright915_at_gmail.com>
Date: 21 Sep 2006 13:47:52 -0700
Message-ID: <1158871672.112827.186630@m7g2000cwm.googlegroups.com>


Hey...even better look at Marathon Technologies Fault Tolerant solution. I've seen PD databases servers fail and never miss a heartbeat when pounding the DB for data. Seriously check it out. Email me if you need me to answer anything about it.

Tom
Arne S wrote:
> Background:
> Part of our production environment is based on RS/6000 technology, with
> HACMP and Oracle9RAC as products on top. We have 4 p570's (4-ways),
> running AIX 5.3ML03, HACMP version 5.2 and OracleRAC version 9.2.0.7.
> These machines are spread across 2 server rooms (about 300meters
> distance). HACMP is configured witch concurrent disk access for Oracle
> db-files on raw devices. Also we have configured HACMP with both IP and
> NON-IP heartbeat (NON-IP heartbeat over SAN-disks). Oracle's
> interconnect are configured as part of HACMP configuration. The total
> number of databases/instances are about 20/80.
>
> My problem:
> During a test failover (the network in one serverrom goes down) I
> observed that all Oracle databases went to "freezed" condition. As far
> as I know, this is not correct. I have problem to find out why, but my
> guess is that Oracle is waiting for some "network down" or "node down"
> from HACMP before Oracle do some action. This will not happend, because
> HACMP is talking to all 4 nodes over NON-IP network over the SAN disks
> in such situation. When I shut down these 2 "isolated" machines, all
> Oracle databases went down (lmon died). I had to start all databases
> manually on the 2 "surviving" nodes. After startup I could access the
> databases as normal.
>
> I have been in contact with Oracle Support, and they say: "The
> configuration is insane. The fix is to configure the clusterware
> heartbeat and the oracle heartbeat on the same network. HACMP and our
> clusterware must see the same view of the cluster."
>
> But what about the NON-IP heartbeat? HACMP MUST be configured to do
> heartbeating over IP and NON-IP network to avoid split in cluster, and
> to avoid disk/data corruption.
>
> I don't think we are the only one customer running AIX, HACMP,
> concurrent disk acess on raw devices and Oracle9RAC. Therefore I hope
> that you or somebody else can help me resolving this issue.
>
> I have opened a service request against both Oracle Support and IBM
> Support and I hope that somebody can help solving this issue. But both
> parts claime on the opposite products....
>
> Any ideas? Shold I make some custom activity in HACMP to disable NON-IP
> disk heartbeat network if this happens? Sounds like lot of shampoo for
> hairless... I presume this could be more like "out-of-box" since the
> product certify matrix is OK..? (Yes I know HACMP is not
> out-of-the-box-product, I think I have pretty good control of my HACMP.)
>
> Any ideas?
>
> Thanks for your time, and thanks in advance!
>
> ArneS
Received on Thu Sep 21 2006 - 15:47:52 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US