FW: ORA-3136 Inbound Connection Time out

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 24 Feb 2011 09:09:07 -0500
Message-ID: <01e901cbd42c$671dd910$35598b30$_at_rsiz.com>



<snipped to fit to list>
 

From: Mark W. Farnham [mailto:mwf_at_rsiz.com] Sent: Thursday, February 24, 2011 9:01 AM To: 'tim_at_evdbt.com'; 'Mohammad Rafiq' Cc: 'oracle list'
Subject: RE: ORA-3136 Inbound Connection Time out  

What Tim said. Also what Micheal said, especially as the ADDM report suggests a larger SGA. Since you're already at 20G I think you're indicating that increasing the SGA would be problematic.  

Since this previously worked for a long time and it is difficult to scope session tracing for sessions that die trying to connect, I'll offer an incomplete laundry list (none of which are silver bullets, but you might get lucky and these are cheap to check):  

  1. Did some ETL or other batch process that formerly completed before the heavy log on window newly expand in duration to consume resources in this period? If so, doing what you need to do to make it finish sooner might fix the problem.
  2. Did someone change auditing? Not only can this drive a sequence thrashing issue, it may drive resource allocations and recursive levels that you did not have before. Some folks don't consider changing auditing to be a "change." It sure can be at the margin of overall system headroom.
  3. Are you shared server and the increased load has too many servers bogged down processing single requests? If so, putting the sessions prone to long duration transactions may alleviate server spawning thrash issues. This is a slippery slope and your mileage may vary.
  4. If you check the listener process, is it pegged on cpu? It has been at least a decade since I've had to have multiple listeners to support the rate of connections, but when that was the problem it presented similar symptoms to your symptom report. But I thought that was partly due to a FIFO bug that was long ago repaired where a connection attempt that couldn't be serviced immediately was shoved down, never reached the top again and eventually timed out. That does NOT present as a network problem (nor should it.)
  5. Are there any delays in log switching? If archiving is behind and auditing is on and you get a pause to free a redo log group to switch into. Adding redo log groups will increase the window you can sustain transactions while archiving throughput is overdriven. Another change sometimes not reported as a change is "Hey, I kept the same total size of online redo, I just wanted to have fewer total files to keep track of, so I made the groups bigger and fewer." In a peak load, smaller files start archiving sooner, so you can sustain being behind the archiving rate longer with more smaller groups. Having more total online archive groups and size increases the overdriven peak you can sustain without affecting service.

I'm not generally a fan of laundry lists, but these and probably a few I forgot to suggest have the quality that they are cheap to check and rule out or in, and checking them should be a minimal distraction to proceeding on other fronts.  

Good luck,  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman
Sent: Wednesday, February 23, 2011 7:50 PM To: Mohammad Rafiql
Cc: oracle list
Subject: Re: ORA-3136 Inbound Connection Time out  

Which wait events showed up in ASH during the 5-10 minutes leading up to the ORA-03136 errors found in your alert.log file?
<snip>
 

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Feb 24 2011 - 08:09:07 CST

Original text of this message