Re: Oracle RAC and VIPs
Date: Tue, 15 Jul 2008 11:29:45 +0200
The crash exact time is not clearly defined, in the morning of May 9th, it was a database crash, not system; crsd.log reported many messages like:
2008-05-09 12:32:33.833: [ CRSEVT]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for ora.<failednode>.ons! (timeout=600)
each message referred to a different resource.
Last week, I tried to restart the failed node (in the meantime, other people made other attempts) and crsd.log reported, among other messages, the following:
2008-07-07 16:10:18.743: [ CRSRES]0CRS-1028: Dependency analysis failed because of: 'Resource in UNKNOWN state: ora.<failednode>.vip'
Using crs_stat -t the ora.<failednode>.vip resource allocation was on the partner node - not the failed one - and its state was UNKNOWN (as expected).
My opinion is that, at the crash time, the partner node performed an automatic failover but it failed; crsd.log of partner node:
2008-05-09 11:55:55.278: [ CRSRES]0Attempting to start `ora.<failednode>.vip` on member `<partnernode>` 2008-05-09 11:56:58.305: [ CRSAPP]0StartResource error for ora.<failednode>.vip error code = -2 2008-05-09 11:57:05.429: [ CRSEVT]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/crs/bin/racgwrap(check) timed out for ora.<failednode>.vip! (timeout=60)
2008-05-09 11:58:01.422: [ CRSRES]0X_OP_StopResourceFailed : Stop Resource failed (File: rti.cpp, line: 1698
2008-05-09 11:58:01.422: [ CRSRES][ALERT]0`ora.<failednode>.vip` on member `<partnernode>` has experienced an unrecoverable failure. 2008-05-09 11:58:01.422: [ CRSRES]0Human intervention required to resume its availability. 2008-05-09 11:58:01.444: [ CRSRES]0CRS-1028: Dependency analysis failed because of:'Resource in UNKNOWN state: ora.<failednode>.vip'
Sorry for the *mess* of messages.....
>If you think it's related to the resource not starting because of some
>dependency, then I'd suggest looking at
>$CRS_HOME/log/<nodename>/crsd/crsd.log on each node (especially the
>crashed node) and see what's there around the time of startup.
>If the node won't boot, try booting it into single user mode and
>disabling clusterware from starting if you think clusterware is what's
>not allowing it to boot completely.
>Alessandro Vercelli wrote:
>> O.S.: RHEL AS4
>> Hardware is HP BL45P, 4 x AMD Dual core, 8 Gb RAM.
>> Oracle 10.2.0.1, RAC and Clusterware
>> Anyway, the issue became "crabbed", since the last attempt to start the failing node succeeded, so I've one more task now...:)).
>> The failed attempts reported on the console that the listener nodeapp could not start; looking into network configuration, I noticed vip IP address for the failing listener was not allocated on that node but on its partner; please, what log files do you suggest for errors?