Re: ASM Instance Not Up Due to Private IP mismatch between Nodes

From: <m.mudhalvan_at_aozorabank.co.jp>
Date: Tue, 22 May 2012 09:44:01 +0900
Message-ID: <OF9DD14C54.197679FD-ON49257A06.0003469D-49257A06.000407AA_at_aozorabank.co.jp>



Hi Andrew and Experts,

        Thanks for your response.

        While in a Deep Dive I think it is a Bug

        Oracle Note id 1374360.1 and Bug# 12425730

        We found the error messages matched. Now checking with Oracle Support to confirm it.

        I have the following information in my logs as mentioned in Bug Note.

$GRID_HOME/log/<hostname>/gipcd/gipcd.log gipcd.l03:2012-05-19 13:38:37.212: [GIPCDMON][1101535552]

gipcdMonitorSaveInfMetrics: inf[ 0]  eth1                 - rank   -1, 
avgms 30000000000.000000 [ 0 / 0 / 0 ]

$GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

orarootagent_root.l02:2012-05-19 11:42:47.383: [ora.diskmon][1122298176] {0:0:2} [check] DiskmonAgent::check {
orarootagent_root.l02:2012-05-19 11:42:47.383: [ora.diskmon][1122298176] {0:0:2} [check] DiskmonAgent::check } - 0 orarootagent_root.l02:2012-05-19 11:42:48.586: [CLSFRAME][1146118256] TM [MultiThread] is changing desired thread # to 5. Current # is 4 orarootagent_root.l02:2012-05-19 11:42:48.587: [ AGFW][1111791936] {0:0:2} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.cluster_interconnect.haip 1 1 orarootagent_root.l02:2012-05-19 11:42:48.587: [ora.cluster_interconnect.haip][1111791936] {0:0:2} [start] clsn_agent::abort {

orarootagent_root.l02:2012-05-19 11:42:48.587: 
[ora.cluster_interconnect.haip][1111791936] {0:0:2} [start] abort {
orarootagent_root.l02:2012-05-19 11:42:48.587: 
[ora.cluster_interconnect.haip][1111791936] {0:0:2} [start] abort command: 
start
orarootagent_root.l02:2012-05-19 11:42:48.587: [ora.cluster_interconnect.haip][1111791936] {0:0:2} [start] tryActionLock {
orarootagent_root.l02:2012-05-19 11:42:48.587: [ USRTHRD][1109690688] {0:0:2} Thread:[NetHAMain]stop {
orarootagent_root.l02:2012-05-19 11:42:48.702: [ USRTHRD][1099069760] {0:0:2} [NetHAMain] thread stopping
orarootagent_root.l02:2012-05-19 11:42:48.702: [ USRTHRD][1099069760] {0:0:2} Thread:[NetHAMain]isRunning is reset to false here orarootagent_root.l02:2012-05-19 11:42:48.703: [ USRTHRD][1109690688] {0:0:2} Thread:[NetHAMain]stop }
orarootagent_root.l02:2012-05-19 11:42:48.703: [ USRTHRD][1109690688] {0:0:2} thread cleaning up
orarootagent_root.l02:2012-05-19 11:42:48.914: [ora.cluster_interconnect.haip][1109690688] {0:0:2} [start] Start of HAIP aborted
orarootagent_root.l02:2012-05-19 11:42:48.915: [ AGENT][1109690688] {0:0:2} UserErrorException: Locale is
orarootagent_root.l02:2012-05-19 11:42:48.915: [ora.cluster_interconnect.haip][1109690688] {0:0:2} [start] clsnUtils::error Exception type=2 string= orarootagent_root.l02:2012-05-19 11:42:48.915: [ AGFW][1109690688] {0:0:2} sending status msg [CRS-5017: The resource action
"ora.cluster_interconnect.haip start" encountered the following error: 
orarootagent_root.l02:2012-05-19 11:42:48.915: 
[ora.cluster_interconnect.haip][1109690688] {0:0:2} [start] 
clsn_agent::start }
orarootagent_root.l02:2012-05-19 11:42:48.915: [ AGFW][1113893184] {0:0:2} Agent sending reply for:
RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:333

Thanks & Regards
Mudhalvan M.M

From: Andrew Kerber <andrew.kerber_at_gmail.com>

To:     anelson77388_at_gmail.com
Cc:     1326914 MUDHALVAN.MUNISWAMY/AOZORABANK_at_AOZORABANK, 
Oracle-L_at_freelists.org
Date: 05/21/2012 10:10 PM
Subject: Re: ASM Instance Not Up Due to Private IP mismatch between Nodes

Seems like I ran into this once before. Try explicitly setting the cluster interconnect in the spfile or pfile.

On Mon, May 21, 2012 at 7:56 AM, Allan Nelson <anelson77388_at_gmail.com> wrote:
The 169.254 addresses are what are termed zero conf addresses. You can google for more information on that topic if you are interested. They are being provided by a new feature of clusterware in 11.2.0.2 whoose name escapes me at the moment. They are being used because your 192.168 addresses have different subnet masks. eth1 has a subnet mask of 255..255.255.0 and eth2 has a subnet mask of 255.255.0.0. When clusterware
came up after the boot it detected this misconfiguration. IP's that have different subnet masks can't talk and so clusterware provieded addresses it
could use.
The RAC will run on these addresses without problems but my recommendation to you would be to fix the misconfiguration of the 192.168 addresses and restart your rac. It is messy to leave them misconfigured and you seem to want your interconnect to be on 192.162 anyway.

Allan

On Mon, May 21, 2012 at 3:57 AM, <m.mudhalvan_at_aozorabank.co.jp> wrote:

> Gurus,
> Good Morning. We had two node RAC instance on 11g Rel 2
(11.2.0.2)
>
> Last Saturday we had some maintenance which involved the restart
> of the Instance including ASM instance. There is no Change in DB or ASM
> Side.
>
> When we tried to stop the ASM instance it failed and later it
> aborted by the cluster service stop command internally.
>
> When we bring the Node 1 again ASM instance is not started and
> keep getting terminated. When we closely check the alert log of both ASM
> Nodes found the private IP address is not matching. It was good until
the
> restart . Since IP are not matching the Disk Groups are not getting
> mounted and it made us to restart the server then both ASM alert log
> showed as private interconnect IP as 169.254.x.x and everything is
working
> fine.
>
> What is my questions to gurus/experts are
>
> 1. What might caused to change private interconnect IP
> segment even though we have specified the private interconnect segment
as
> 192.168.x.x?
>
> 2. Do we have any problem since both nodes are running on
> private Interconnect IP segment 169.254.x.x?
>
>
> Node 1: ASM Alert Log
> Private Interface 'eth1' configured from GPnP for use as a private
> interconnect.
> [name='eth1', type=1, ip=192.168.X.X, net=192.168.X.0/24,
> mask=255.255.255.0, use=cluster_interconnect/6]
>
> Node 2: ASM Alert Log
> Private Interface 'eth1:1' configured from GPnP for use as a private
> interconnect.
> [name='eth1:1', type=1, ip=169.254.X.X, net=169.254.X.X/16,
> mask=255.255.0.0, use=haip:cluster_interconnect/62]
>
> Alert Log in Databases - Occurred multiple times
>
> CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon
is
> not running)
>
> CRS-5019:All OCR locations are on ASM disk groups [DG_DATA01], and none
of
> these disk groups are mounted.
>
>
>
> Thanks & Regards
> Mudhalvan M.M
>
> Infrastructure Management Division
> Tel. 042(319)4516 Ext. 34516
> Mobile 81-80-4890-1973
> Email m.mudhalvan_at_aozorabank.co.jp
>
>



> $B!V(B $B$=$N_at_h$O!"$"$*$>$i!#(B $B!W(B
> $BCm!'$3$N(BE-mail$B$O!"5!L)>pJs$r4^$s$G$*$j!"H/?.<T$,0U?^$7$?(B $B<u?.<T$N$_$,Mx(B
$BMQ(B
> $B$9$k$3$H$r0U?^$7$?$b$N$G$9!#K|$,0l!"5.EB(B $B$,$3$N(BE-mail$B$NH/?.<T$,0U?^$7$?(B
$B<u(B
> $B?.<T$G$J$$>l9g$K$O!"$3(B $B$N(BE-mail$B$N0u:~!"%3%T!<!"E>Aw$=$NB>0l_at_Z$N;HMQ$O6X(B
$B;_(B
> $B$5$l(B $B$^$9!#EvJ}$N8m$j$K$h$j$3$N(BE-mail$B$r$*<u$1<h$j$K$J$C$?>l9g(B $B$O!"$*<j?t(B
$B$r(B

> $B$*$+$1$7$^$9$,!"$3$N(BE-mail$B$rGK4~$7!"D>$A$K$4O"(B $BMm$rD:BW$G$-$^$9$H9,$$$G(B
$B$9(B

> $B!#(B
>
>


>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
--
http://www.freelists.org/webpage/oracle-l





-- 
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'


--
http://www.freelists.org/webpage/oracle-l
Received on Mon May 21 2012 - 19:44:01 CDT

Original text of this message