Re: the opposite of DCD (dead connection detection)??

From: Ben <benalvey_at_yahoo.com>
Date: Wed, 6 Feb 2008 05:44:06 -0800 (PST)
Message-ID: <9e8aa990-9398-4eca-8a43-89b9cc405a1b@u10g2000prn.googlegroups.com>


On Feb 5, 2:31 pm, joel garry <joel-ga..._at_home.com> wrote:
> On Feb 5, 5:12 am, Ben <benal..._at_yahoo.com> wrote:
>
>
>
>
>
> > On Feb 5, 7:51 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > On Feb 4, 4:13 pm, Ben <benal..._at_yahoo.com> wrote:
>
> > > > On Jan 11, 1:42 pm, joel garry <joel-ga..._at_home.com> wrote:
>
> > > > > On Jan 10, 10:59 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > > > > On Jan 9, 11:14 am, bdbafh <bdb..._at_gmail.com> wrote:
>
> > > > > > > On Jan 9, 10:52 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > > > > > > 10.2.0.2 AIX 5.2 64bit.
>
> > > > > > > > Idle connections are getting terminated in databases on one of our
> > > > > > > > servers. I've checked profiles and resource plans to see if any have a
> > > > > > > > limit on idle time and there are none. It was first noticed in an
> > > > > > > > application and I've since been testing scenarios using sqlplus and it
> > > > > > > > happens with sqlplus also. I also thought it was limited to
> > > > > > > > connections that were coming in from another server but that isn't the
> > > > > > > > case either.
>
> > > > > > > > Certain things that stand out. It doesn't happen if the connection is
> > > > > > > > for the sysdba user. It doesn't happen if the connection is a local
> > > > > > > > connection that doesn't have the @sid in the connection string.
>
> > > > > > > > Any thoughts?
>
> > > > > > > Is there a host-based firewall in use (meaning on the localhost)?
>
> > > > > > > Can you test out this for connection on the same subnet (that would
> > > > > > > not traverse a router) vs. connections from clients not on the same
> > > > > > > subnet?
>
> > > > > > > This sure smells like a network issue where (translated) connections
> > > > > > > are being timed out.
>
> > > > > > > Start working with the network techs to see if they have any ideas.
>
> > > > > > > -bdbafh
>
> > > > > > Just another little piece of info. I setup a cronjob to see how long
> > > > > > the connections stay connected when they are idle, looks like
> > > > > > approximately 12 hours
>
> > > > > That sounds like a TCP setting.  Seehttp://www.tek-tips.com/viewthread.cfm?qid=1407908&page=12
>
> > > > > Found by googling on:
> > > > > tcp port timeout aix
>
> > > > > I think he was looking at the wrong setting in that post, but I'm not
> > > > > on AIX so can't comment further.  But I believe there are more
> > > > > settings, including some that timeout the port after a longer time
> > > > > than seconds.  Might have time or wait in the name.
>
> > > > > This one seems to be saying I'm wrong:http://unix.derkeiler.com/Newsgroups/comp.unix.aix/2006-12/msg00359.html
>
> > > > > Maybe there's some resource limit on the AIX side... like the ulimit
> > > > > cpu seconds?
>
> > > > > jg
> > > > > --
> > > > > @home.com is bogus.
> > > > > "A slipping gear could let your M203 grenade launcher fire when
> > > > > you least expect it. That would make you quite unpopular in
> > > > > what's left of  your unit." - Army's magazine of preventive
> > > > > maintenance.- Hide quoted text -
>
> > > > > - Show quoted text -
>
> > > > Is there a table that can be updated to make oracle think a session
> > > > has been idle for longer than it actually has?
> > > > I would like to determine for sure if this problem is from the
> > > > database or the OS. Our admin installed the patch that she found but
> > > > it didn't help.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > I think it is something with the database that is causing the issue.
>
> > > I believe I am getting the following error in the sqlnet.log everytime
> > > this happens. The only problem is that I don't have a sqlnet.ora setup
> > > where I would expect it to be, as a matter of fact I don't think there
> > > is one period. This looks like a timeout to me, but I thought the
> > > default was unlimited.
>
> > > ***********************************************************************
> > > Fatal NI connect error 12170.
>
> > >   VERSION INFORMATION:
> > >         TNS for IBM/AIX RISC System/6000: Version 10.2.0.2.0 -
> > > Production
> > >         TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000:
> > > Version 10.2.0.2.0 - Production
> > >         Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/
> > > 6000: Version 10.2.0.2.0 - Production
> > >   Time: 04-FEB-2008 11:05:02
> > >   Tracing not turned on.
> > >   Tns error struct:
> > >     ns main err code: 12535
> > >     TNS-12535: TNS:operation timed out
> > >     ns secondary err code: 12560
> > >     nt main err code: 505
> > >     TNS-00505: Operation timed out
> > >     nt secondary err code: 78
> > >     nt OS err code: 0
> > >   Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.31.249.254)
> > > (PORT=57681))
>
> > > ***********************************************************************
> > > Fatal NI connect error 12170.
>
> > >   VERSION INFORMATION:
> > >         TNS for IBM/AIX RISC System/6000: Version 10.2.0.2.0 -
> > > Production
> > >         TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000:
> > > Version 10.2.0.2.0 - Production
> > >         Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/
> > > 6000: Version 10.2.0.2.0 - Production
> > >   Time: 04-FEB-2008 19:46:53
> > >   Tracing not turned on.
> > >   Tns error struct:
> > >     ns main err code: 12535
> > >     TNS-12535: TNS:operation timed out
> > >     ns secondary err code: 12560
> > >     nt main err code: 505
> > >     TNS-00505: Operation timed out
> > >     nt secondary err code: 78
> > >     nt OS err code: 0
> > >   Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.31.249.254)
> > > (PORT=61393))- Hide quoted text -
>
> > > - Show quoted text -
>
> > Side note: I just did a find as root over the entire system and The
> > only sqlnet.ora file is under ORACLE_HOME/network/admin/samples
>
> > Also, the errors listed in the sqlnet.log are related to the logon
> > timing out, correct? The thing that confuses me about that is that I
> > don't have any issues with logging on. I can log the session on and
> > run some queries, then leave it idle for 12 hours and it gets killed
>
> Hmmmm:
>
> $ oerr tns 12535
> 12535, 00000, "TNS:operation timed out"
> // *Cause: The requested connection could not be completed within the
> timeout
> // period specified by the CONNECT_TIMEOUT parameter in listener.ora.
> This
> // error arises from the tnslsnr.
> // *Action: Either reconfigure CONNECT_TIMEOUT to be 0, which means
> // wait indefinitely, or reconfigure CONNECT_TIMEOUT to be some higher
> // value. Or, if the timeout is unacceptably long, turn on tracing
> // for further information.
>
> Of course, that error message is obsolete, seehttp://download.oracle.com/docs/cd/B19306_01/network.102/b14213/unsup...
> andhttp://download.oracle.com/docs/cd/B19306_01/network.102/b14213/liste...
>
> This all kind of strikes me as confusing too, perhaps there is a
> platform specific bug for AIX.  Also see metalink Note:345197.1  There
> must be something trying to log on after 12 hours idle is all I can
> guess.  Try setting the new parameters and tracing and see.
>
> Note:472572.1 is a long shot, but hey, maybe just one of your servers
> is missing an OS patch...
>
> jg
> --
> @home.com is bogus.http://www.nytimes.com/2008/02/05/science/space/05spotters.html?_r=1&...- Hide quoted text -
>
> - Show quoted text -

I just don't think it is related to 345197.1 I don't get any errors in the alert log.

I'm going to have our AIX admin check out the links from 472572.1

Also, I got a client side sqlnet trace for a connection that this occured. Here's the relavant portion..

[05-FEB-2008 15:26:55:978] nsdo: normal exit
[05-FEB-2008 15:26:55:978] nioqrc: exit
[06-FEB-2008 07:39:38:194] nioqsn: entry
[06-FEB-2008 07:39:38:195] nioqsn: exit
[06-FEB-2008 07:39:38:195] nioqrc: entry
[06-FEB-2008 07:39:38:195] nsdo: entry
[06-FEB-2008 07:39:38:195] nsdo: cid=0, opcode=84, *bl=0, *what=1,
uflgs=0x20, cflgs=0x3
[06-FEB-2008 07:39:38:195] nsdo: rank=64, nsctxrnk=0
[06-FEB-2008 07:39:38:195] nsdo: nsctx: state=8, flg=0x400d, mvd=0
[06-FEB-2008 07:39:38:195] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011
[06-FEB-2008 07:39:38:196] nsdofls: entry
[06-FEB-2008 07:39:38:196] nsdofls: DATA flags: 0x0
[06-FEB-2008 07:39:38:196] nsdofls: sending NSPTDA packet
[06-FEB-2008 07:39:38:196] nspsend: entry
[06-FEB-2008 07:39:38:196] nspsend: plen=256, type=6
[06-FEB-2008 07:39:38:196] nttwr: entry
[06-FEB-2008 07:39:38:196] ntt2err: entry
[06-FEB-2008 07:39:38:196] ntt2err: soc 9 error - operation=6,
ntresnt[0]=517, ntresnt[1]=32, ntresnt[2]=0
[06-FEB-2008 07:39:38:196] ntt2err: exit
[06-FEB-2008 07:39:38:196] nttwr: exit
[06-FEB-2008 07:39:38:196] nspsend: 0 bytes to transport
[06-FEB-2008 07:39:38:196] nspsend: transport write error
[06-FEB-2008 07:39:38:196] nspsend: error exit
[06-FEB-2008 07:39:38:196] nserror: entry [06-FEB-2008 07:39:38:196] nserror: nsres: id=0, op=67, ns=12547, ns2=12560; nt[0]=517, nt[1]=32, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[06-FEB-2008 07:39:38:197] nsdofls: exit (-1)
[06-FEB-2008 07:39:38:197] nsdo: nsctxrnk=0
[06-FEB-2008 07:39:38:197] nsdo: error exit
[06-FEB-2008 07:39:38:197] nioqrc: send failed: bl = 1, nicbl = 1
[06-FEB-2008 07:39:38:197] nioqper:  error from nioqrc
[06-FEB-2008 07:39:38:197] nioqper:    ns main err code: 12547
[06-FEB-2008 07:39:38:197] nioqper:    ns (2)  err code: 12560
[06-FEB-2008 07:39:38:197] nioqper:    nt main err code: 517
[06-FEB-2008 07:39:38:197] nioqper:    nt (2)  err code: 32
[06-FEB-2008 07:39:38:197] nioqper:    nt OS   err code: 0
[06-FEB-2008 07:39:38:197] nioqer: entry
[06-FEB-2008 07:39:38:197] nioqer:  incoming err = 12150
[06-FEB-2008 07:39:38:197] nioqce: entry
[06-FEB-2008 07:39:38:197] nioqce: exit
[06-FEB-2008 07:39:38:197] nioqer:  returning err = 3135
[06-FEB-2008 07:39:38:198] nioqer: exit
[06-FEB-2008 07:39:38:198] nioqrc:  returning error: 3135
[06-FEB-2008 07:39:38:198] nioqrc: exit
[06-FEB-2008 07:39:38:198] nioqds: entry
[06-FEB-2008 07:39:38:198] nioqds:  disconnecting...
[06-FEB-2008 07:39:38:198] nsclose: entry
[06-FEB-2008 07:39:38:198] nstimarmed: entry
[06-FEB-2008 07:39:38:198] nstimarmed: no timer allocated
[06-FEB-2008 07:39:38:198] nstimarmed: normal exit
[06-FEB-2008 07:39:38:198] nttctl: entry
[06-FEB-2008 07:39:38:198] nttctl: entry
[06-FEB-2008 07:39:38:198] nsdo: entry
[06-FEB-2008 07:39:38:198] nsdo: cid=0, opcode=66, *bl=0, *what=0,
uflgs=0x0, cflgs=0x2
[06-FEB-2008 07:39:38:198] nsdo: rank=64, nsctxrnk=0
[06-FEB-2008 07:39:38:198] nsdo: nsctx: state=1, flg=0x4009, mvd=0
[06-FEB-2008 07:39:38:198] nsdo: entry
[06-FEB-2008 07:39:38:199] nsdo: cid=0, opcode=67, *bl=0, *what=1,
uflgs=0x0, cflgs=0x1
[06-FEB-2008 07:39:38:199] nsdo: nsctx: state=1, flg=0x4009, mvd=0
[06-FEB-2008 07:39:38:199] nsdo: error exit
[06-FEB-2008 07:39:38:199] nsbfr: entry
[06-FEB-2008 07:39:38:199] nsbaddfl: entry
[06-FEB-2008 07:39:38:199] nsbaddfl: normal exit
[06-FEB-2008 07:39:38:199] nsbfr: normal exit
[06-FEB-2008 07:39:38:199] nsbfr: entry
[06-FEB-2008 07:39:38:199] nsbaddfl: entry
[06-FEB-2008 07:39:38:199] nsbaddfl: normal exit
[06-FEB-2008 07:39:38:199] nsbfr: normal exit
[06-FEB-2008 07:39:38:199] nsdo: nsctxrnk=0
[06-FEB-2008 07:39:38:199] nsdo: normal exit
[06-FEB-2008 07:39:38:199] nsiocancel: entry
[06-FEB-2008 07:39:38:199] nsiofrrg: entry
[06-FEB-2008 07:39:38:199] nsbfr: entry
[06-FEB-2008 07:39:38:200] nsbaddfl: entry
[06-FEB-2008 07:39:38:200] nsbaddfl: normal exit
[06-FEB-2008 07:39:38:200] nsbfr: normal exit
[06-FEB-2008 07:39:38:200] nsiofrrg: exit
[06-FEB-2008 07:39:38:200] nsiocancel: exit
[06-FEB-2008 07:39:38:200] nsclose: closing transport
[06-FEB-2008 07:39:38:200] nttdisc: entry
[06-FEB-2008 07:39:38:200] nttdisc: Closed socket 9
[06-FEB-2008 07:39:38:200] nttdisc: exit
[06-FEB-2008 07:39:38:200] nsclose: global context check-out (from
slot 0) complete
[06-FEB-2008 07:39:38:200] nsnadisc: entry
[06-FEB-2008 07:39:38:200] nadisc: entry
[06-FEB-2008 07:39:38:200] nacomtm: entry
[06-FEB-2008 07:39:38:200] nacompd: entry
[06-FEB-2008 07:39:38:200] nacompd: exit
[06-FEB-2008 07:39:38:201] nacompd: entry
[06-FEB-2008 07:39:38:201] nacompd: exit
[06-FEB-2008 07:39:38:201] nacomtm: exit
[06-FEB-2008 07:39:38:201] nas_dis: entry
[06-FEB-2008 07:39:38:201] nas_dis: exit
[06-FEB-2008 07:39:38:201] nau_dis: entry
[06-FEB-2008 07:39:38:201] nau_dis: exit
[06-FEB-2008 07:39:38:201] naeetrm: entry
[06-FEB-2008 07:39:38:201] naeetrm: exit
[06-FEB-2008 07:39:38:201] naectrm: entry
[06-FEB-2008 07:39:38:201] naectrm: exit
[06-FEB-2008 07:39:38:201] nagbltrm: entry
[06-FEB-2008 07:39:38:201] nau_gtm: entry
[06-FEB-2008 07:39:38:201] nau_gtm: exit
[06-FEB-2008 07:39:38:201] nagbltrm: exit
[06-FEB-2008 07:39:38:201] nadisc: exit
[06-FEB-2008 07:39:38:202] nsnadisc: normal exit
[06-FEB-2008 07:39:38:202] nsopenfree_nsntx: nlhthdel from
mplx_ht_nsgbu, ctx=10237b50 nsntx=102504f0
[06-FEB-2008 07:39:38:202] nsiocancel: entry
[06-FEB-2008 07:39:38:202] nsiofrrg: entry
[06-FEB-2008 07:39:38:202] nsiofrrg: exit
[06-FEB-2008 07:39:38:202] nsiocancel: exit
[06-FEB-2008 07:39:38:202] nsmfr: entry
[06-FEB-2008 07:39:38:202] nsmfr: 2160 bytes at 0x1102504f0
[06-FEB-2008 07:39:38:202] nsmfr: normal exit
[06-FEB-2008 07:39:38:202] nsmfr: entry
[06-FEB-2008 07:39:38:202] nsmfr: 154 bytes at 0x110254db0
[06-FEB-2008 07:39:38:202] nsmfr: normal exit
[06-FEB-2008 07:39:38:202] nsmfr: entry
[06-FEB-2008 07:39:38:202] nsmfr: 264 bytes at 0x110217230
[06-FEB-2008 07:39:38:202] nsmfr: normal exit
[06-FEB-2008 07:39:38:203] nladtrm: entry
[06-FEB-2008 07:39:38:203] nladtrm: exit
[06-FEB-2008 07:39:38:203] nsmfr: entry
[06-FEB-2008 07:39:38:203] nsmfr: 1144 bytes at 0x110237b50
[06-FEB-2008 07:39:38:203] nsmfr: normal exit
[06-FEB-2008 07:39:38:203] nsclose: normal exit
[06-FEB-2008 07:39:38:203] nigcui: entry
[06-FEB-2008 07:39:38:203] nigcui: Clr User Interrupt: hdl=1,
prc=0x100c7c98, ctx=0x10216630.
[06-FEB-2008 07:39:38:203] nigcui: exit (0)
[06-FEB-2008 07:39:38:203] nioqds: exit
[06-FEB-2008 07:39:38:203] nsbfree: entry
[06-FEB-2008 07:39:38:203] nsbgetfl: entry
[06-FEB-2008 07:39:38:203] nsbgetfl: normal exit
[06-FEB-2008 07:39:38:203] nsbaddfl: entry
[06-FEB-2008 07:39:38:204] nsbaddfl: normal exit
[06-FEB-2008 07:39:38:204] nsbfree: normal exit
[06-FEB-2008 07:39:38:204] nsbfree: entry
[06-FEB-2008 07:39:38:204] nsbgetfl: entry
[06-FEB-2008 07:39:38:204] nsbgetfl: normal exit
[06-FEB-2008 07:39:38:204] nsbaddfl: entry
[06-FEB-2008 07:39:38:204] nsbaddfl: normal exit
[06-FEB-2008 07:39:38:204] nsbfree: normal exit
[06-FEB-2008 07:39:38:204] nigtrm: Count in the NI global area is now
2
[06-FEB-2008 07:39:38:204] nsbfrfl: entry
[06-FEB-2008 07:39:38:204] nsbrfr: entry
[06-FEB-2008 07:39:38:204] nsbrfr: nsbfs at 0x10253bb0, data at
0x10250e30.
[06-FEB-2008 07:39:38:204] nsbrfr: normal exit
[06-FEB-2008 07:39:38:204] nsbrfr: entry
[06-FEB-2008 07:39:38:204] nsbrfr: nsbfs at 0x102544b0, data at
0x10254570.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit
[06-FEB-2008 07:39:38:205] nsbrfr: entry
[06-FEB-2008 07:39:38:205] nsbrfr: nsbfs at 0x10250d70, data at
0x10252670.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit
[06-FEB-2008 07:39:38:205] nsbrfr: entry
[06-FEB-2008 07:39:38:205] nsbrfr: nsbfs at 0x101ec870, data at
0x10253c70.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit
[06-FEB-2008 07:39:38:205] nsbrfr: entry
[06-FEB-2008 07:39:38:205] nsbrfr: nsbfs at 0x101ec930, data at
0x10252eb0.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit
[06-FEB-2008 07:39:38:205] nsbfrfl: normal exit
[06-FEB-2008 07:39:38:205] nttdisc: entry
[06-FEB-2008 07:39:38:205] nttdisc: Closed socket 8
[06-FEB-2008 07:39:38:205] nttdisc: exit
[06-FEB-2008 07:39:38:205] nigtrm: Count in the NL global area is now
3 Received on Wed Feb 06 2008 - 07:44:06 CST

Original text of this message