Re: Effect of listener on existing connections?

From: joel garry <>
Date: Fri, 13 Jun 2008 10:34:47 -0700 (PDT)
Message-ID: <>

On Jun 13, 6:40 am, wrote:
> Hi all,
> Oracle support is not giving us satisfactory results.  Perhaps you can
> give some answers?
> We've recently upgraded our system to Oracle, running on
> Solaris (sparc) 10 inside a ZFS zone (our previous system was Oracle
> running on sparc Solaris 8, and was running on that for the
> last 5 years).  Since the upgrade 6 weeks ago, we've had two instances
> where our applications (running in the same O/S environment on a
> different node on the cluster) have locked up - existing connections
> to Oracle become unresponsive when executing SQL (with no error
> message - they just block), and attempts to create new connections are
> met with the error:
> "ORA-03135: connection lost contact".
> The first time this happened, the outage lasted for about 5 minutes,
> then it went away, and execution proceeded normally.  The second time
> it happened it lasted for 25 minutes until we were able to intervene
> manually, and "fixed" the problem by restarting the instance &
> listener.  During the time of the outage, there were no error messages
> in the alert.log, listener.log or /var/adm/messages.  However, a few
> minutes after normal operation was restored (the first time), and
> right as we were restarting the instance (the second time), we saw
> these messages appear in the alert.log:
> "WARNING: inbound connection timed out (ORA-3136)".
> We also see these messages appearing with regularity in our listener
> log, during all times of operation (not just in proximity to the
> outage):
> "WARNING: Subscription for node down event still pending"
> We opened an SR with Oracle Support, but so far, I'm unimpressed with
> their response.  They've told me nothing that I already found on
> Google from searching for those error messages - namely that we need
> to add some lines to our listener.ora & sqlnet.ora:
> listener.ora:
> sqlnet.ora:
> We've made these changes, but I have low confidence that they will
> actually solve the problem (and I've told Oracle as much) for the
> following reasons:
> 1. The SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER value is to address the
> issue of the listener locking up if you're not using ONS.  However,
> how does a blocked listener explain the fact that apps with existing
> connections to the db become blocked?  My understanding (and I could
> be wrong here) is that once you're connected to the instance, there is
> no further authentication that needs to be performed.  Our in-house
> experiments also show that we can "kill -STOP" the listener and apps
> with existing connections continue to perform normally.
> 2. The INBOUND_CONNECT_TIMEOUT value is to address the issue with
> clients that are "slow to authenticate", but the "WARNING: inbound
> connection timed out (ORA-3136)" message appears AFTER the crisis
> interval - in the recent
> case, it appeared 20 minutes after our app became blocked. I would
> expect to see it within 60 seconds of the block, as that's what the
> value currently is.
> 3. All apps trying to establish new connections (including sqlplus,
> running on the same node as the instance) received the login error,
> not just "certain apps" as described in the Oracle tech note
> 274303.1.  Why would an app like sqlplus, running on the same box as
> the server (which was the case here) need more than 1 minute to
> authenticate when logging in?  Even our own apps shouldn't be taking
> long to authenticate.
> Should this event occur again, I don't see how Support will be able to
> resolve it, as they haven't asked me for any additional info.  I want
> to know from them what steps I need to take to gather information so
> that they can REALLY fix the problem, or give me an answer that
> unambiguously addresses the issue, rather than just googling on error
> numbers.  So far my requests for a clear action plan from them have
> been met with the email equivalent of a blank, slack-jawed stare.
> =========
> So, my basic questions are:
> 1.  Can listener unavailability cause existing client connections to
> become unresponsive?
> 2.  If the answer to #1 is "no", Is there some way I can escalate this
> issue within Support to get to a analyst who actually understands how
> the Oracle server works, and is capable of doing something other than
> typing search queries into metalink?
> Thanks,
> -S

Have you check TCP issues? Do you have lots of ports used? What kind of timeout settings do you have? Do you see any FIN_WAIT type port blocking when the issue occurs, or lasting long times? Have you modified sqlnet to use larger buffer sizes? What kind of ping response do you get during the problem times?

Just some questions I would ask, I'd be wondering about the OS not being nice to Oracle.


-- is bogus.
"That's strange..." - what you don't want to hear trying to track down
denied insurance claims.
Received on Fri Jun 13 2008 - 12:34:47 CDT

Original text of this message