Re: Oracle RAC and VIPs

From: Alessandro Vercelli <alever_at_libero.it>
Date: Tue, 15 Jul 2008 11:29:45 +0200
Message-Id: <K41JPL$73EBAC7190280FDDED528F29DC76097E@libero.it>


The crash exact time is not clearly defined, in the morning of May 9th, it was a database crash, not system; crsd.log reported many messages like:

2008-05-09 12:32:33.833: [ CRSEVT][3695033264]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for ora.<failednode>.ons! (timeout=600)

each message referred to a different resource.

Last week, I tried to restart the failed node (in the meantime, other people made other attempts) and crsd.log reported, among other messages, the following:

2008-07-07 16:10:18.743: [ CRSRES][3781585840]0CRS-1028: Dependency analysis failed because of: 'Resource in UNKNOWN state: ora.<failednode>.vip'

Using crs_stat -t the ora.<failednode>.vip resource allocation was on the partner node - not the failed one - and its state was UNKNOWN (as expected).

My opinion is that, at the crash time, the partner node performed an automatic failover but it failed; crsd.log of partner node:

2008-05-09 11:55:55.278: [  CRSRES][3686595504]0Attempting to start `ora.<failednode>.vip` on member `<partnernode>`
2008-05-09 11:56:58.305: [  CRSAPP][3686595504]0StartResource error for ora.<failednode>.vip error code = -2
2008-05-09 11:57:05.429: [  CRSEVT][3697085360]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for ora.<failednode>.vip! (timeout=60)

and, finally:

2008-05-09 11:58:01.422: [ CRSRES][3686595504]0X_OP_StopResourceFailed : Stop Resource failed (File: rti.cpp, line: 1698

2008-05-09 11:58:01.422: [  CRSRES][3686595504][ALERT]0`ora.<failednode>.vip` on member `<partnernode>` has experienced an unrecoverable failure.
2008-05-09 11:58:01.422: [  CRSRES][3686595504]0Human intervention required to resume its availability.
2008-05-09 11:58:01.444: [  CRSRES][3686595504]0CRS-1028: Dependency analysis failed because of:
'Resource in UNKNOWN state: ora.<failednode>.vip'

Sorry for the *mess* of messages.....

Thanks,

Alessandro

>If you think it's related to the resource not starting because of some
>dependency, then I'd suggest looking at
>$CRS_HOME/log/<nodename>/crsd/crsd.log on each node (especially the
>crashed node) and see what's there around the time of startup.
>
>If the node won't boot, try booting it into single user mode and
>disabling clusterware from starting if you think clusterware is what's
>not allowing it to boot completely.
>
>Dan
>
>Alessandro Vercelli wrote:
>> O.S.: RHEL AS4
>> Hardware is HP BL45P, 4 x AMD Dual core, 8 Gb RAM.
>> Oracle 10.2.0.1, RAC and Clusterware
>>
>> Anyway, the issue became "crabbed", since the last attempt to start the failing node succeeded, so I've one more task now...:)).
>>
>> The failed attempts reported on the console that the listener nodeapp could not start; looking into network configuration, I noticed vip IP address for the failing listener was not allocated on that node but on its partner; please, what log files do you suggest for errors?
>>
>> Thanks,
>>
>> Alessandro
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 15 2008 - 04:29:45 CDT

Original text of this message