Re: Long connect time when one node in RAC goes down

From: Yechiel Adar <adar666_at_inter.net.il>
Date: Thu, 04 Sep 2008 10:34:19 +0300
Message-id: <48BF8F7B.9060805@inter.net.il>


With the help of Oracle support we narrowed the problem to names resolution. We shut down node 2 and started a session with client trace. I saw in the trace that sqlnet is deciding to use server2-vip. After that it try to convert the name to tcp/ip address. When sqlnet try to convert server2-vip to tcp/ip address he is stuck.

It seems that somewhere in the network something is not updated when the vip is moved to the other node and it takes about 6 (or 6*2) seconds
until sqlnet gets error from the network and then it try to connect with the second entry, server-vip1, and this works.

Have you heard anything about this problem?

We are going to do a test using the ip itself instead of names in the tnsnames
and also to use a sniffer to find out what happens during these 6 seconds.

Adar Yechiel
Rechovot, Israel

Yechiel Adar wrote:
> We need some help.
> RAC, Oracle 10.2.0.3 on windows 2003 servers 64 bit.
>
> We did a fail over test. We disconnected one server from the network
> by pulling the network cable.
> The system worked fine, but once in a while a connection will take 6
> seconds instead on 20 ms.
> We understand that this happens because the VIP is moved to the second
> computer and there is
> nothing there to handle calls on that TCP address.
>
> I would like to know how to shorten the time from 6 seconds to almost
> nothing.
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Sep 04 2008 - 02:34:19 CDT

Original text of this message