Re: DCD dead connection detection in 12c

From: Jeremy Schneider <jeremy.schneider_at_ardentperf.com>
Date: Fri, 15 Aug 2014 12:52:35 -0400
Message-ID: <CA+fnDAbs96-90LA5QF-CSPQnf=BwqRP=dOum17xJZFEmEM-M4g_at_mail.gmail.com>



probably most people wouldn't notice but i mistakenly got that example output from the database server... you actually should run it on the client server. Here's what it looks like from two different clients.

with enable=broken in jdbc connect string: tcp 0 0 ::ffff:192.168.1.216:43940 ::ffff:192.168.1.130:1521   ESTABLISHED keepalive (2271.79/0/0)
tcp 0 0 ::ffff:192.168.1.216:42615 ::ffff:192.168.1.130:1521 ESTABLISHED keepalive (4657.81/0/0)
tcp 110 0 ::ffff:192.168.1.216:40552 ::ffff:192.168.1.130:1521 ESTABLISHED keepalive (1074.00/0/0)

without enable=broken in jdbc connect string: tcp 2970 0 ::ffff:192.168.1.181:60553 ::ffff:192.168.1.170:1521   ESTABLISHED off (0.00/0/0)
tcp 2910 0 ::ffff:192.168.1.181:59678 ::ffff:192.168.1.170:1521 ESTABLISHED off (0.00/0/0)
tcp 0 0 ::ffff:192.168.1.181:60610 ::ffff:192.168.1.170:1521 ESTABLISHED off (0.00/0/0)
tcp 2980 0 ::ffff:192.168.1.181:59744 ::ffff:192.168.1.170:1521 ESTABLISHED off (0.00/0/0)

OS settings are identical on these two servers.

-J

--
http://about.me/jeremy_schneider


On Fri, Aug 15, 2014 at 12:42 PM, Jeremy Schneider <
jeremy.schneider_at_ardentperf.com> wrote:


> Adding just two more points, since I have been recently working on DCD
> with RH linux myself.
>
> strace is quite detailed, but a much easier way to do the job is just use
> "netstat -nto|grep 1521" or replace 1521 with your listener port if it's
> non-default. The "o" option is the magic one for keepalive. In the far
> right column you should see the string "keepalive" rather than "off" and it
> will tell you the actual amount of time remaining on each keepalive
> connection.
>
> Example Output:
> tcp 0 0 192.168.1.130:1521 192.168.1.130:22335
> ESTABLISHED keepalive (2380.58/0/0)
> tcp 0 0 192.168.1.130:1521 192.168.1.104:56698
> ESTABLISHED off (0.00/0/0)
> tcp 0 0 192.168.1.130:1521 192.168.1.146:56850
> ESTABLISHED off (0.00/0/0)
> tcp 0 0 192.168.1.130:1521 192.168.1.130:31120
> TIME_WAIT timewait (13.21/0/0)
>
> Notice that the connection from the db server to itself (130) above has
> keepalive enabled, but the clients (104 and 146) do not have keepalive
> enabled. Which brings up a second point. We were using the thin jdbc
> client in some cases and discovered that keepalive was not enabled by this
> driver unless you switched to the long format and explicitly specified
> "(enable=broken)" in the long TNS entry. This is in addition to the kernel
> settings which must be correctly configured.
>
> -Jeremy
>
>
> --
> http://about.me/jeremy_schneider
>
>
> On Thu, Aug 14, 2014 at 12:43 PM, Riyaj Shamsudeen <
> riyaj.shamsudeen_at_gmail.com> wrote:
>
>> Hello April,
>> Since you have set the sqlnet.expire_time to 10 minutes, every 10
>> minutes a TCP/IP packet is sent to that client port. If a TCP ACK is
>> received in a short interval, then both tcp_keepalive and SQLNET timers are
>> reset. If the TCP ACK is not received , then TCP retransmission code kicks
>> in, TCP packet is retransmitted tcp_retries2 (15 default) times with an
>> exponential back off controlled by tcp retransmission interval.
>> So, in your case, tcp shouldn't kill the connection in 2 hours at all,
>> from the host side. However, I have seen port level timeouts in the
>> switch/firewall configurations that is kept at 2 hours normally. Check with
>> network group to see if that is happening.
>> Also conduct this test:
>> a. create a sqlplus connection from that client machine connecting to
>> the database.
>> b. Identify the dedicated server process for that connection. Strace
>> the dedicated server process:
>> strace -tttT -o /tmp/dcd.lst -p <pid>
>> c. Just keep the sqlplus connection idle during this period. not even
>> an enter.
>> Reading the /tmp/dcd.lst file, you should see packets every 10 minutes.
>> If it dies after 2 hours, then check with firewall/network group.
>>
>> Hope this helps,
>>
>> Cheers
>>
>> Riyaj Shamsudeen
>> Principal DBA,
>> Ora!nternals - http://www.orainternals.com - Specialists in
>> Performance, RAC and EBS
>> Blog: http://orainternals.wordpress.com/
>> Oracle ACE Director and OakTable member <http://www.oaktable.com/>
>>
>> Co-author of the books: Expert Oracle Practices
>> <http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL,
>> <http://tinyurl.com/ahpvms8> <http://tinyurl.com/ahpvms8>Expert RAC
>> Practices 12c. <http://tinyurl.com/expert-rac-12c> Expert PL/SQL
>> practices <http://tinyurl.com/book-expert-plsql-practices>
>>
>> <http://tinyurl.com/book-expert-plsql-practices>
>>
>>
>>
>> On Thu, Aug 14, 2014 at 7:30 AM, April Sims <aprilcsims_at_gmail.com> wrote:
>>
>>> Need some help in resolving our new idle timeouts seen since going to
>>> 12c.
>>> I have a document
>>>
>>> Oracle Net 12c: New Implementation of Dead Connection Detection (DCD)
>>> (Doc ID 1591874.1)
>>>
>>> We are on Linux RH 64-bit so this is applicable.
>>>
>>> Our current OS settings look like the following:
>>>
>>> # cat /proc/sys/net/ipv4/tcp_keepalive_time
>>> 7200
>>>
>>> # cat /proc/sys/net/ipv4/tcp_keepalive_intvl
>>> 75
>>>
>>> # cat /proc/sys/net/ipv4/tcp_keepalive_probes
>>> 9
>>>
>>>
>>> sqlnet.ora
>>> SQLNET.EXPIRE_TIME = 10
>>>
>>> SQLNET.INBOUND_CONNECT_TIMEOUT = 120
>>>
>>> listener.ora
>>>
>>> INBOUND_CONNECT_TIMEOUT_LISTENER_listenername = 120
>>>
>>> Any suggestions on the changes I need to make to prevent a 2 hour idle
>>> timeout?
>>>
>>> thanks,
>>>
>>>
>>> --
>>> April C. Sims
>>> http://aprilcsims.wordpress.com
>>> Twitter, LinkedIn
>>> Oracle Database 11g - Underground Advice for Database Administrators
>>> https://www.packtpub.com/oracle-11g-database-implementations-guide/book
>>> OCP 8i, 9i, 10g, 11g DBA
>>> Southern Utah University
>>> aprilcsims_at_gmail.com
>>>
>>
>>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Fri Aug 15 2014 - 18:52:35 CEST

Original text of this message