Re: Doubt about timeout between nodes of cluster

From: Waldirio Manhães Pinheiro <waldirio_at_gmail.com>
Date: Thu, 12 Jun 2008 14:42:12 -0300
Message-ID: <7df9f1820806121042x47494325p8f6d7e27a79dd267@mail.gmail.com>

    Hello Friend

  Thank you for answer .., let's check.

2008/6/12, Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com>:
>
> Hello Waldirio
> >> the time to the first machine detect the second machine powered off is
> very big (between 1 and 2 min),
> How are you measuring this time? Are you checking alert log or are you
> using DB connections to check it?

   I was check this time starting when I have been send the shutdown to server until the second VIP interface up on second node (backup node).

 Can you also send crsd.log?

Ok, following the address because the size ... http://rafb.net/p/hqE13995.html

When I send the power off on first node, on second node (crsd log on link above), on line 1 log the message "[ COMMCRS][1147169120]clsc_receive: (0xc6d180) Error receiving, ns (12535, 12560), transport (505, 110, 0)" and still "Connection not active" until line 2045.

PS: Now, my VIP address of first node don't migrated to second node later power off ... (maybe will be necessary re-install the OS and Oracle ClusterWare, because I've changed the system a lot of to test)

 Further, refer $CRS_HOME/bin/racgvip and there are few parameters such as
> check interval, restart attempts etc controlling behavior of VIP failover
> too. Not sure, they are applicable when machine is rebooted since heartbeat
> will fail before vip check..

Yes, I checked this file too, but don't changed.

Now, looking the crsd log file, I believe the Oracle know when another node is out, but who is responsible to make a failover (mount the aliases of VIP on another machine) !? (Script, Daemon, Angel :P )

Thank you friends for help.
Waldirio

Cheers
> Riyaj Shamsudeen
> The Pythian Group www.pythian.com
> Personal blog: orainternals.wordpress.com
>
> Waldirio Manhães Pinheiro wrote:
>
>> Hello Friends
>> I'd like to ask about Oracle RAC in Linux environment. I installed two
>> machine with RedHat AS 4Up5 and Oracle 10.2.0.3 <http://10.2.0.3/> with
>> ClusterWare. The installation finish with successful and the data base work
>> fine.
>> I checked my environment of availability with the test below:
>> Station cambeba UP
>> Station cangua UP
>> # crs_stat -t
>> Name Type Target State Host
>> ------------------------------------------------------------
>> ora....BA.lsnr application ONLINE ONLINE cambeba
>> ora....eba.gsd application ONLINE ONLINE cambeba
>> ora....eba.ons application ONLINE ONLINE cambeba
>> ora....eba.vip application ONLINE ONLINE cambeba
>> ora....UA.lsnr application ONLINE ONLINE cangua
>> ora.cangua.gsd application ONLINE ONLINE cangua
>> ora.cangua.ons application ONLINE ONLINE cangua
>> ora.cangua.vip application ONLINE ONLINE cangua
>> ora.ora10gq.db application ONLINE ONLINE cangua
>> ora....q1.inst application ONLINE ONLINE cangua
>> ora....q2.inst application ONLINE ONLINE cambeba
>> At this point, that's ok, but when I force a power off in cangua or
>> cambeba (the name of my machines), the time to the firt machine detect the
>> second machine powered off is very big (between 1 and 2 min), so, if my
>> client was working, will lost the query for time out.
>> I changed the configurations in objects ora.cambeba.vip and
>> ora.cangua.vip, but without successful.
>> Any Ideia to fix this problem (decrease the time of check between nodes
>> on cluster) ?!?!
>> PS: I checked in list database, but without successful about this problem
>>
>> Thanks in advanced.
>> --
>> ______________
>> Atenciosamente
>> Waldirio
>> msn: wmp_at_sinope.com.br <mailto:wmp_at_sinope.com.br>
>> Site: www.waldirio.com.br <http://www.waldirio.com.br/>
>> Blog: blog.waldirio.com.br <http://blog.waldirio.com.br/>
>> PGP: www.waldirio.com.br/public.html <
>> http://www.waldirio.com.br/public.html>
>>
>
>

-- 
______________
Atenciosamente
Waldirio
msn: wmp_at_sinope.com.br
Site: www.waldirio.com.br
Blog: blog.waldirio.com.br
PGP: www.waldirio.com.br/public.html

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jun 12 2008 - 12:42:12 CDT

Original text of this message