Linux fs.aio-max-nr Leak?

From: Kenny Payton <k3nnyp_at_gmail.com>
Date: Mon, 29 Sep 2014 10:50:46 -0400
Message-ID: <CAEidWqPmz=qDpejbX70e0mR41tOc5wX4yVqR0wKBb7x0DTf0DQ_at_mail.gmail.com>



Curious what others are seeing with asynch io requests. I'm running 11gR2 on ASM and have seen what appears to be aio request leaks for a number of years now on various versions. Oracle's recommended suggestion is to set this at 1M but I've seen it float upwards of 5M oustanding requests between bounces. Bouncing the database frees them up and starts over but typically I just bump the max on the server dynamically and go on about my day. Most recently we hit our 5M ceiling, unexpectedly because our monitor was broken during a recent monitoring system upgrade. We have bumped our ceiling to 10M and have our monitor back working reporting when we cross 50%.

These are pretty active databases. The instance in question for this event is 20T, all flash storage, in size and runs around 15k iops. Oracle Linux 6.3.

Typically sessions return an error to the client stating max aio has been reached but this particular case we had an odd scenario. A number of sessions wrote the message to their trace file but instead of returning the error to the client and aborting the statement the sessions spun on cpu. strace nor 10046 returned any results from the process and ultimately we had to kill -9 the processes to free up the resources. The one thing all of these sessions had in common was they were all accessing, some updates while others just select, LOB segments. Possibly a bug in the LOB access code path that is not handling the aio os message.

Thanks,
Kenny

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Sep 29 2014 - 16:50:46 CEST

Original text of this message