RE: DCD and TCP timeout

From: Hameed, Amir <Amir.Hameed_at_xerox.com>
Date: Wed, 13 Nov 2013 00:06:08 +0000
Message-ID: <AF02C941134B1A4AB5F61A726D08DCED0DEF54DA_at_USA7109MB012.na.xerox.net>



Thanks Riyaj.
This whole investigation started while doing destructive testing for the ERP Concurrent Managers. There are two VM servers in the Concurrent Processing (CP) tier and they are configured in an active/passive manner. When the node where the Internal Concurrent Manager (ICM) was running was shutdown, the ICM and the other managers would not failover in a timely manner. Further investigation showed that connections of those managers were still reported as active in the V$SESSION view. Those connections started to clean up in about 15-18 minutes and that is when the ICM started to failover to its secondary node followed by the other managers. I see the following in note Performance problem with Oracle*Net Failover when TCP Network down (no IP address) (Doc ID 249213.1)
net.ipv4.tcp_keepalive_time 3000
net.ipv4.tcp_retries 5
net.ipv4.tcp_syn_retries 1

Which is supposed to reduce the timeout period to about 20 seconds.

Do you have any suggestions on the above settings?

Thanks again.
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Riyaj Shamsudeen Sent: Tuesday, November 12, 2013 6:36 PM To: Hameed, Amir
Cc: Oracle List List
Subject: Re: DCD and TCP timeout

Amir
  Setting expire_time to 1, will send a SQLNET packet every minute ( so a TCP/IP probe is sent every minute). In normal conditions, a TCP ACK will be received immediately.

 But, if the client forcefully dies or kills the connections, then TCP retransmission code kicks-in for the unacknowledged TCP/IP transmission. The tcp_retries2 kernel parameter controls the behavior of retransmission, in LINUX. In a connection ESTABLISED state, TCP/IP retransmits 15 times (default value of tcp_retries2 kernel parameter) , with an exponential backoff for TCP retransmission interval, before raising an alarm to the application. Read the link below, and I think, I had similar results as your test case, last time I performed the test. This behavior is only applicable to LINUX.

  Do you really care to change the TCP level parameters? If yes, you can reduce the tcp_retries2 parameter to a value >8 and test it.( Please let me know if you still see a different behavior after the adjustment). If that isn't enough, tcp_keepalive_time can be reduced to 10 minutes, but it can increase network traffic marginally (one tcp keep alive parameter every 10 minutes to all alive TCP/IP sockets).

 Read : http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux

 HTH Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals - http://www.orainternals.com<http://www.orainternals.com/> - Specialists in Performance, RAC and EBS Blog: http://orainternals.wordpress.com/ Oracle ACE Director and OakTable member<http://www.oaktable.com/>

Co-author of the books: Expert Oracle Practices<http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL, <http://tinyurl.com/ahpvms8> Expert RAC Practices 12c.<http://tinyurl.com/expert-rac-12c> Expert PL/SQL practices<http://tinyurl.com/book-expert-plsql-practices>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Nov 13 2013 - 01:06:08 CET

Original text of this message