Re: the opposite of DCD (dead connection detection)??

From: joel garry <joel-garry_at_home.com>
Date: Tue, 5 Feb 2008 11:31:13 -0800 (PST)
Message-ID: <cb952cb4-65e7-4367-a35e-2e841a5eb732@f10g2000hsf.googlegroups.com>


On Feb 5, 5:12 am, Ben <benal..._at_yahoo.com> wrote:
> On Feb 5, 7:51 am, Ben <benal..._at_yahoo.com> wrote:
>
>
>
>
>
> > On Feb 4, 4:13 pm, Ben <benal..._at_yahoo.com> wrote:
>
> > > On Jan 11, 1:42 pm, joel garry <joel-ga..._at_home.com> wrote:
>
> > > > On Jan 10, 10:59 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > > > On Jan 9, 11:14 am, bdbafh <bdb..._at_gmail.com> wrote:
>
> > > > > > On Jan 9, 10:52 am, Ben <benal..._at_yahoo.com> wrote:
>
> > > > > > > 10.2.0.2 AIX 5.2 64bit.
>
> > > > > > > Idle connections are getting terminated in databases on one of our
> > > > > > > servers. I've checked profiles and resource plans to see if any have a
> > > > > > > limit on idle time and there are none. It was first noticed in an
> > > > > > > application and I've since been testing scenarios using sqlplus and it
> > > > > > > happens with sqlplus also. I also thought it was limited to
> > > > > > > connections that were coming in from another server but that isn't the
> > > > > > > case either.
>
> > > > > > > Certain things that stand out. It doesn't happen if the connection is
> > > > > > > for the sysdba user. It doesn't happen if the connection is a local
> > > > > > > connection that doesn't have the @sid in the connection string.
>
> > > > > > > Any thoughts?
>
> > > > > > Is there a host-based firewall in use (meaning on the localhost)?
>
> > > > > > Can you test out this for connection on the same subnet (that would
> > > > > > not traverse a router) vs. connections from clients not on the same
> > > > > > subnet?
>
> > > > > > This sure smells like a network issue where (translated) connections
> > > > > > are being timed out.
>
> > > > > > Start working with the network techs to see if they have any ideas.
>
> > > > > > -bdbafh
>
> > > > > Just another little piece of info. I setup a cronjob to see how long
> > > > > the connections stay connected when they are idle, looks like
> > > > > approximately 12 hours
>
> > > > That sounds like a TCP setting.  Seehttp://www.tek-tips.com/viewthread.cfm?qid=1407908&page=12
>
> > > > Found by googling on:
> > > > tcp port timeout aix
>
> > > > I think he was looking at the wrong setting in that post, but I'm not
> > > > on AIX so can't comment further.  But I believe there are more
> > > > settings, including some that timeout the port after a longer time
> > > > than seconds.  Might have time or wait in the name.
>
> > > > This one seems to be saying I'm wrong:http://unix.derkeiler.com/Newsgroups/comp.unix.aix/2006-12/msg00359.html
>
> > > > Maybe there's some resource limit on the AIX side... like the ulimit
> > > > cpu seconds?
>
> > > > jg
> > > > --
> > > > @home.com is bogus.
> > > > "A slipping gear could let your M203 grenade launcher fire when
> > > > you least expect it. That would make you quite unpopular in
> > > > what's left of  your unit." - Army's magazine of preventive
> > > > maintenance.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Is there a table that can be updated to make oracle think a session
> > > has been idle for longer than it actually has?
> > > I would like to determine for sure if this problem is from the
> > > database or the OS. Our admin installed the patch that she found but
> > > it didn't help.- Hide quoted text -
>
> > > - Show quoted text -
>
> > I think it is something with the database that is causing the issue.
>
> > I believe I am getting the following error in the sqlnet.log everytime
> > this happens. The only problem is that I don't have a sqlnet.ora setup
> > where I would expect it to be, as a matter of fact I don't think there
> > is one period. This looks like a timeout to me, but I thought the
> > default was unlimited.
>
> > ***********************************************************************
> > Fatal NI connect error 12170.
>
> >   VERSION INFORMATION:
> >         TNS for IBM/AIX RISC System/6000: Version 10.2.0.2.0 -
> > Production
> >         TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000:
> > Version 10.2.0.2.0 - Production
> >         Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/
> > 6000: Version 10.2.0.2.0 - Production
> >   Time: 04-FEB-2008 11:05:02
> >   Tracing not turned on.
> >   Tns error struct:
> >     ns main err code: 12535
> >     TNS-12535: TNS:operation timed out
> >     ns secondary err code: 12560
> >     nt main err code: 505
> >     TNS-00505: Operation timed out
> >     nt secondary err code: 78
> >     nt OS err code: 0
> >   Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.31.249.254)
> > (PORT=57681))
>
> > ***********************************************************************
> > Fatal NI connect error 12170.
>
> >   VERSION INFORMATION:
> >         TNS for IBM/AIX RISC System/6000: Version 10.2.0.2.0 -
> > Production
> >         TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000:
> > Version 10.2.0.2.0 - Production
> >         Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/
> > 6000: Version 10.2.0.2.0 - Production
> >   Time: 04-FEB-2008 19:46:53
> >   Tracing not turned on.
> >   Tns error struct:
> >     ns main err code: 12535
> >     TNS-12535: TNS:operation timed out
> >     ns secondary err code: 12560
> >     nt main err code: 505
> >     TNS-00505: Operation timed out
> >     nt secondary err code: 78
> >     nt OS err code: 0
> >   Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.31.249.254)
> > (PORT=61393))- Hide quoted text -
>
> > - Show quoted text -
>
> Side note: I just did a find as root over the entire system and The
> only sqlnet.ora file is under ORACLE_HOME/network/admin/samples
>
> Also, the errors listed in the sqlnet.log are related to the logon
> timing out, correct? The thing that confuses me about that is that I
> don't have any issues with logging on. I can log the session on and
> run some queries, then leave it idle for 12 hours and it gets killed

Hmmmm:

$ oerr tns 12535
12535, 00000, "TNS:operation timed out"
// *Cause: The requested connection could not be completed within the timeout
// period specified by the CONNECT_TIMEOUT parameter in listener.ora. This

// error arises from the tnslsnr.
// *Action: Either reconfigure CONNECT_TIMEOUT to be 0, which means
// wait indefinitely, or reconfigure CONNECT_TIMEOUT to be some higher
// value. Or, if the timeout is unacceptably long, turn on tracing
// for further information.

Of course, that error message is obsolete, see http://download.oracle.com/docs/cd/B19306_01/network.102/b14213/unsupported.htm#sthref994 and http://download.oracle.com/docs/cd/B19306_01/network.102/b14213/listener.htm#i503196

This all kind of strikes me as confusing too, perhaps there is a platform specific bug for AIX. Also see metalink Note:345197.1 There must be something trying to log on after 12 hours idle is all I can guess. Try setting the new parameters and tracing and see.

Note:472572.1 is a long shot, but hey, maybe just one of your servers is missing an OS patch...

jg

--
@home.com is bogus.
http://www.nytimes.com/2008/02/05/science/space/05spotters.html?_r=1&oref=slogin
Received on Tue Feb 05 2008 - 13:31:13 CST

Original text of this message