Re: the opposite of DCD (dead connection detection)??
Date: Wed, 6 Feb 2008 05:44:06 -0800 (PST)
Message-ID: <9e8aa990-9398-4eca-8a43-89b9cc405a1b@u10g2000prn.googlegroups.com>
On Feb 5, 2:31 pm, joel garry <joel-ga..._at_home.com> wrote:
> On Feb 5, 5:12 am, Ben <benal..._at_yahoo.com> wrote:
>
>
>
>
>
> > On Feb 5, 7:51 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > On Feb 4, 4:13 pm, Ben <benal..._at_yahoo.com> wrote:
>
> > > > On Jan 11, 1:42 pm, joel garry <joel-ga..._at_home.com> wrote:
>
> > > > > On Jan 10, 10:59 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > > > > On Jan 9, 11:14 am, bdbafh <bdb..._at_gmail.com> wrote:
>
> > > > > > > On Jan 9, 10:52 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > > > > > > 10.2.0.2 AIX 5.2 64bit.
>
> > > > > > > > Idle connections are getting terminated in databases on one of our
> > > > > > > > servers. I've checked profiles and resource plans to see if any have a
> > > > > > > > limit on idle time and there are none. It was first noticed in an
> > > > > > > > application and I've since been testing scenarios using sqlplus and it
> > > > > > > > happens with sqlplus also. I also thought it was limited to
> > > > > > > > connections that were coming in from another server but that isn't the
> > > > > > > > case either.
>
> > > > > > > > Certain things that stand out. It doesn't happen if the connection is
> > > > > > > > for the sysdba user. It doesn't happen if the connection is a local
> > > > > > > > connection that doesn't have the @sid in the connection string.
>
> > > > > > > > Any thoughts?
>
> > > > > > > Is there a host-based firewall in use (meaning on the localhost)?
>
> > > > > > > Can you test out this for connection on the same subnet (that would
> > > > > > > not traverse a router) vs. connections from clients not on the same
> > > > > > > subnet?
>
> > > > > > > This sure smells like a network issue where (translated) connections
> > > > > > > are being timed out.
>
> > > > > > > Start working with the network techs to see if they have any ideas.
>
> > > > > > > -bdbafh
>
> > > > > > Just another little piece of info. I setup a cronjob to see how long
> > > > > > the connections stay connected when they are idle, looks like
> > > > > > approximately 12 hours
>
> > > > > That sounds like a TCP setting. Seehttp://www.tek-tips.com/viewthread.cfm?qid=1407908&page=12
>
> > > > > Found by googling on:
> > > > > tcp port timeout aix
>
> > > > > I think he was looking at the wrong setting in that post, but I'm not
> > > > > on AIX so can't comment further. But I believe there are more
> > > > > settings, including some that timeout the port after a longer time
> > > > > than seconds. Might have time or wait in the name.
>
> > > > > This one seems to be saying I'm wrong:http://unix.derkeiler.com/Newsgroups/comp.unix.aix/2006-12/msg00359.html
>
> > > > > Maybe there's some resource limit on the AIX side... like the ulimit
> > > > > cpu seconds?
>
> > > > > jg
> > > > > --
> > > > > @home.com is bogus.
> > > > > "A slipping gear could let your M203 grenade launcher fire when
> > > > > you least expect it. That would make you quite unpopular in
> > > > > what's left of your unit." - Army's magazine of preventive
> > > > > maintenance.- Hide quoted text -
>
> > > > > - Show quoted text -
>
> > > > Is there a table that can be updated to make oracle think a session
> > > > has been idle for longer than it actually has?
> > > > I would like to determine for sure if this problem is from the
> > > > database or the OS. Our admin installed the patch that she found but
> > > > it didn't help.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > I think it is something with the database that is causing the issue.
>
> > > I believe I am getting the following error in the sqlnet.log everytime
> > > this happens. The only problem is that I don't have a sqlnet.ora setup
> > > where I would expect it to be, as a matter of fact I don't think there
> > > is one period. This looks like a timeout to me, but I thought the
> > > default was unlimited.
>
> > > ***********************************************************************
> > > Fatal NI connect error 12170.
>
> > > VERSION INFORMATION:
> > > TNS for IBM/AIX RISC System/6000: Version 10.2.0.2.0 -
> > > Production
> > > TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000:
> > > Version 10.2.0.2.0 - Production
> > > Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/
> > > 6000: Version 10.2.0.2.0 - Production
> > > Time: 04-FEB-2008 11:05:02
> > > Tracing not turned on.
> > > Tns error struct:
> > > ns main err code: 12535
> > > TNS-12535: TNS:operation timed out
> > > ns secondary err code: 12560
> > > nt main err code: 505
> > > TNS-00505: Operation timed out
> > > nt secondary err code: 78
> > > nt OS err code: 0
> > > Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.31.249.254)
> > > (PORT=57681))
>
> > > ***********************************************************************
> > > Fatal NI connect error 12170.
>
> > > VERSION INFORMATION:
> > > TNS for IBM/AIX RISC System/6000: Version 10.2.0.2.0 -
> > > Production
> > > TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000:
> > > Version 10.2.0.2.0 - Production
> > > Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/
> > > 6000: Version 10.2.0.2.0 - Production
> > > Time: 04-FEB-2008 19:46:53
> > > Tracing not turned on.
> > > Tns error struct:
> > > ns main err code: 12535
> > > TNS-12535: TNS:operation timed out
> > > ns secondary err code: 12560
> > > nt main err code: 505
> > > TNS-00505: Operation timed out
> > > nt secondary err code: 78
> > > nt OS err code: 0
> > > Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.31.249.254)
> > > (PORT=61393))- Hide quoted text -
>
> > > - Show quoted text -
>
> > Side note: I just did a find as root over the entire system and The
> > only sqlnet.ora file is under ORACLE_HOME/network/admin/samples
>
> > Also, the errors listed in the sqlnet.log are related to the logon
> > timing out, correct? The thing that confuses me about that is that I
> > don't have any issues with logging on. I can log the session on and
> > run some queries, then leave it idle for 12 hours and it gets killed
>
> Hmmmm:
>
> $ oerr tns 12535
> 12535, 00000, "TNS:operation timed out"
> // *Cause: The requested connection could not be completed within the
> timeout
> // period specified by the CONNECT_TIMEOUT parameter in listener.ora.
> This
> // error arises from the tnslsnr.
> // *Action: Either reconfigure CONNECT_TIMEOUT to be 0, which means
> // wait indefinitely, or reconfigure CONNECT_TIMEOUT to be some higher
> // value. Or, if the timeout is unacceptably long, turn on tracing
> // for further information.
>
> Of course, that error message is obsolete, seehttp://download.oracle.com/docs/cd/B19306_01/network.102/b14213/unsup...
> andhttp://download.oracle.com/docs/cd/B19306_01/network.102/b14213/liste...
>
> This all kind of strikes me as confusing too, perhaps there is a
> platform specific bug for AIX. Also see metalink Note:345197.1 There
> must be something trying to log on after 12 hours idle is all I can
> guess. Try setting the new parameters and tracing and see.
>
> Note:472572.1 is a long shot, but hey, maybe just one of your servers
> is missing an OS patch...
>
> jg
> --
> @home.com is bogus.http://www.nytimes.com/2008/02/05/science/space/05spotters.html?_r=1&...- Hide quoted text -
>
> - Show quoted text -
I just don't think it is related to 345197.1 I don't get any errors in the alert log.
I'm going to have our AIX admin check out the links from 472572.1
Also, I got a client side sqlnet trace for a connection that this occured. Here's the relavant portion..
[05-FEB-2008 15:26:55:978] nsdo: normal exit [05-FEB-2008 15:26:55:978] nioqrc: exit [06-FEB-2008 07:39:38:194] nioqsn: entry [06-FEB-2008 07:39:38:195] nioqsn: exit [06-FEB-2008 07:39:38:195] nioqrc: entry [06-FEB-2008 07:39:38:195] nsdo: entry [06-FEB-2008 07:39:38:195] nsdo: cid=0, opcode=84, *bl=0, *what=1,uflgs=0x20, cflgs=0x3
[06-FEB-2008 07:39:38:195] nsdo: rank=64, nsctxrnk=0 [06-FEB-2008 07:39:38:195] nsdo: nsctx: state=8, flg=0x400d, mvd=0 [06-FEB-2008 07:39:38:195] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011 [06-FEB-2008 07:39:38:196] nsdofls: entry [06-FEB-2008 07:39:38:196] nsdofls: DATA flags: 0x0 [06-FEB-2008 07:39:38:196] nsdofls: sending NSPTDA packet [06-FEB-2008 07:39:38:196] nspsend: entry [06-FEB-2008 07:39:38:196] nspsend: plen=256, type=6 [06-FEB-2008 07:39:38:196] nttwr: entry [06-FEB-2008 07:39:38:196] ntt2err: entry [06-FEB-2008 07:39:38:196] ntt2err: soc 9 error - operation=6, ntresnt[0]=517, ntresnt[1]=32, ntresnt[2]=0 [06-FEB-2008 07:39:38:196] ntt2err: exit [06-FEB-2008 07:39:38:196] nttwr: exit [06-FEB-2008 07:39:38:196] nspsend: 0 bytes to transport [06-FEB-2008 07:39:38:196] nspsend: transport write error [06-FEB-2008 07:39:38:196] nspsend: error exit[06-FEB-2008 07:39:38:196] nserror: entry [06-FEB-2008 07:39:38:196] nserror: nsres: id=0, op=67, ns=12547, ns2=12560; nt[0]=517, nt[1]=32, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[06-FEB-2008 07:39:38:197] nsdofls: exit (-1) [06-FEB-2008 07:39:38:197] nsdo: nsctxrnk=0 [06-FEB-2008 07:39:38:197] nsdo: error exit [06-FEB-2008 07:39:38:197] nioqrc: send failed: bl = 1, nicbl = 1 [06-FEB-2008 07:39:38:197] nioqper: error from nioqrc [06-FEB-2008 07:39:38:197] nioqper: ns main err code: 12547 [06-FEB-2008 07:39:38:197] nioqper: ns (2) err code: 12560 [06-FEB-2008 07:39:38:197] nioqper: nt main err code: 517 [06-FEB-2008 07:39:38:197] nioqper: nt (2) err code: 32 [06-FEB-2008 07:39:38:197] nioqper: nt OS err code: 0 [06-FEB-2008 07:39:38:197] nioqer: entry [06-FEB-2008 07:39:38:197] nioqer: incoming err = 12150 [06-FEB-2008 07:39:38:197] nioqce: entry [06-FEB-2008 07:39:38:197] nioqce: exit [06-FEB-2008 07:39:38:197] nioqer: returning err = 3135 [06-FEB-2008 07:39:38:198] nioqer: exit [06-FEB-2008 07:39:38:198] nioqrc: returning error: 3135 [06-FEB-2008 07:39:38:198] nioqrc: exit [06-FEB-2008 07:39:38:198] nioqds: entry [06-FEB-2008 07:39:38:198] nioqds: disconnecting... [06-FEB-2008 07:39:38:198] nsclose: entry [06-FEB-2008 07:39:38:198] nstimarmed: entry [06-FEB-2008 07:39:38:198] nstimarmed: no timer allocated [06-FEB-2008 07:39:38:198] nstimarmed: normal exit [06-FEB-2008 07:39:38:198] nttctl: entry [06-FEB-2008 07:39:38:198] nttctl: entry [06-FEB-2008 07:39:38:198] nsdo: entry [06-FEB-2008 07:39:38:198] nsdo: cid=0, opcode=66, *bl=0, *what=0,uflgs=0x0, cflgs=0x2
[06-FEB-2008 07:39:38:198] nsdo: rank=64, nsctxrnk=0 [06-FEB-2008 07:39:38:198] nsdo: nsctx: state=1, flg=0x4009, mvd=0 [06-FEB-2008 07:39:38:198] nsdo: entry [06-FEB-2008 07:39:38:199] nsdo: cid=0, opcode=67, *bl=0, *what=1,uflgs=0x0, cflgs=0x1
[06-FEB-2008 07:39:38:199] nsdo: nsctx: state=1, flg=0x4009, mvd=0 [06-FEB-2008 07:39:38:199] nsdo: error exit [06-FEB-2008 07:39:38:199] nsbfr: entry [06-FEB-2008 07:39:38:199] nsbaddfl: entry [06-FEB-2008 07:39:38:199] nsbaddfl: normal exit [06-FEB-2008 07:39:38:199] nsbfr: normal exit [06-FEB-2008 07:39:38:199] nsbfr: entry [06-FEB-2008 07:39:38:199] nsbaddfl: entry [06-FEB-2008 07:39:38:199] nsbaddfl: normal exit [06-FEB-2008 07:39:38:199] nsbfr: normal exit [06-FEB-2008 07:39:38:199] nsdo: nsctxrnk=0 [06-FEB-2008 07:39:38:199] nsdo: normal exit [06-FEB-2008 07:39:38:199] nsiocancel: entry [06-FEB-2008 07:39:38:199] nsiofrrg: entry [06-FEB-2008 07:39:38:199] nsbfr: entry [06-FEB-2008 07:39:38:200] nsbaddfl: entry [06-FEB-2008 07:39:38:200] nsbaddfl: normal exit [06-FEB-2008 07:39:38:200] nsbfr: normal exit [06-FEB-2008 07:39:38:200] nsiofrrg: exit [06-FEB-2008 07:39:38:200] nsiocancel: exit [06-FEB-2008 07:39:38:200] nsclose: closing transport [06-FEB-2008 07:39:38:200] nttdisc: entry [06-FEB-2008 07:39:38:200] nttdisc: Closed socket 9 [06-FEB-2008 07:39:38:200] nttdisc: exit [06-FEB-2008 07:39:38:200] nsclose: global context check-out (fromslot 0) complete
[06-FEB-2008 07:39:38:200] nsnadisc: entry [06-FEB-2008 07:39:38:200] nadisc: entry [06-FEB-2008 07:39:38:200] nacomtm: entry [06-FEB-2008 07:39:38:200] nacompd: entry [06-FEB-2008 07:39:38:200] nacompd: exit [06-FEB-2008 07:39:38:201] nacompd: entry [06-FEB-2008 07:39:38:201] nacompd: exit [06-FEB-2008 07:39:38:201] nacomtm: exit [06-FEB-2008 07:39:38:201] nas_dis: entry [06-FEB-2008 07:39:38:201] nas_dis: exit [06-FEB-2008 07:39:38:201] nau_dis: entry [06-FEB-2008 07:39:38:201] nau_dis: exit [06-FEB-2008 07:39:38:201] naeetrm: entry [06-FEB-2008 07:39:38:201] naeetrm: exit [06-FEB-2008 07:39:38:201] naectrm: entry [06-FEB-2008 07:39:38:201] naectrm: exit [06-FEB-2008 07:39:38:201] nagbltrm: entry [06-FEB-2008 07:39:38:201] nau_gtm: entry [06-FEB-2008 07:39:38:201] nau_gtm: exit [06-FEB-2008 07:39:38:201] nagbltrm: exit [06-FEB-2008 07:39:38:201] nadisc: exit [06-FEB-2008 07:39:38:202] nsnadisc: normal exit [06-FEB-2008 07:39:38:202] nsopenfree_nsntx: nlhthdel frommplx_ht_nsgbu, ctx=10237b50 nsntx=102504f0
[06-FEB-2008 07:39:38:202] nsiocancel: entry [06-FEB-2008 07:39:38:202] nsiofrrg: entry [06-FEB-2008 07:39:38:202] nsiofrrg: exit [06-FEB-2008 07:39:38:202] nsiocancel: exit [06-FEB-2008 07:39:38:202] nsmfr: entry [06-FEB-2008 07:39:38:202] nsmfr: 2160 bytes at 0x1102504f0 [06-FEB-2008 07:39:38:202] nsmfr: normal exit [06-FEB-2008 07:39:38:202] nsmfr: entry [06-FEB-2008 07:39:38:202] nsmfr: 154 bytes at 0x110254db0 [06-FEB-2008 07:39:38:202] nsmfr: normal exit [06-FEB-2008 07:39:38:202] nsmfr: entry [06-FEB-2008 07:39:38:202] nsmfr: 264 bytes at 0x110217230 [06-FEB-2008 07:39:38:202] nsmfr: normal exit [06-FEB-2008 07:39:38:203] nladtrm: entry [06-FEB-2008 07:39:38:203] nladtrm: exit [06-FEB-2008 07:39:38:203] nsmfr: entry [06-FEB-2008 07:39:38:203] nsmfr: 1144 bytes at 0x110237b50 [06-FEB-2008 07:39:38:203] nsmfr: normal exit [06-FEB-2008 07:39:38:203] nsclose: normal exit [06-FEB-2008 07:39:38:203] nigcui: entry [06-FEB-2008 07:39:38:203] nigcui: Clr User Interrupt: hdl=1,prc=0x100c7c98, ctx=0x10216630.
[06-FEB-2008 07:39:38:203] nigcui: exit (0) [06-FEB-2008 07:39:38:203] nioqds: exit [06-FEB-2008 07:39:38:203] nsbfree: entry [06-FEB-2008 07:39:38:203] nsbgetfl: entry [06-FEB-2008 07:39:38:203] nsbgetfl: normal exit [06-FEB-2008 07:39:38:203] nsbaddfl: entry [06-FEB-2008 07:39:38:204] nsbaddfl: normal exit [06-FEB-2008 07:39:38:204] nsbfree: normal exit [06-FEB-2008 07:39:38:204] nsbfree: entry [06-FEB-2008 07:39:38:204] nsbgetfl: entry [06-FEB-2008 07:39:38:204] nsbgetfl: normal exit [06-FEB-2008 07:39:38:204] nsbaddfl: entry [06-FEB-2008 07:39:38:204] nsbaddfl: normal exit [06-FEB-2008 07:39:38:204] nsbfree: normal exit [06-FEB-2008 07:39:38:204] nigtrm: Count in the NI global area is now2
[06-FEB-2008 07:39:38:204] nsbfrfl: entry [06-FEB-2008 07:39:38:204] nsbrfr: entry [06-FEB-2008 07:39:38:204] nsbrfr: nsbfs at 0x10253bb0, data at0x10250e30.
[06-FEB-2008 07:39:38:204] nsbrfr: normal exit [06-FEB-2008 07:39:38:204] nsbrfr: entry [06-FEB-2008 07:39:38:204] nsbrfr: nsbfs at 0x102544b0, data at0x10254570.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit [06-FEB-2008 07:39:38:205] nsbrfr: entry [06-FEB-2008 07:39:38:205] nsbrfr: nsbfs at 0x10250d70, data at0x10252670.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit [06-FEB-2008 07:39:38:205] nsbrfr: entry [06-FEB-2008 07:39:38:205] nsbrfr: nsbfs at 0x101ec870, data at0x10253c70.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit [06-FEB-2008 07:39:38:205] nsbrfr: entry [06-FEB-2008 07:39:38:205] nsbrfr: nsbfs at 0x101ec930, data at0x10252eb0.
[06-FEB-2008 07:39:38:205] nsbrfr: normal exit [06-FEB-2008 07:39:38:205] nsbfrfl: normal exit [06-FEB-2008 07:39:38:205] nttdisc: entry [06-FEB-2008 07:39:38:205] nttdisc: Closed socket 8 [06-FEB-2008 07:39:38:205] nttdisc: exit [06-FEB-2008 07:39:38:205] nigtrm: Count in the NL global area is now3 Received on Wed Feb 06 2008 - 07:44:06 CST