Re: OEL - fork: Resource temporarily unavailable

From: Mihajlo Tekic <mihajlo.tekic_at_gmail.com>
Date: Thu, 22 Sep 2011 09:49:07 -0500
Message-ID: <CAGWRspa=R4Xgq8dNroeWJQ5VfK7S6LKX-pWs=81_ubr4YNms9g_at_mail.gmail.com>



4000 concurrent sessions per node? I assume you mean active sessions. This triggers a question, how much resources you have available on each of the nodes to support all these connections?

RAM 128G, you mentioned that. But, is it enough?

Are you also experiencing performance problems with the existing sessions?

Anyway, >>>“fork: Resource temporarily unavailable”<<< is pretty much self explanatory. At large, it indicates a resource problem.

From your strace input it looks clone call is failing to create a child process.



clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b9c118e8670) = -1 EAGAIN (Resource temporarily unavailable)

From checking clone man page it looks like it failed while creating a child process due to too many processes already running (EAGAIN error for clone).

http://linux.die.net/man/2/clone

According to fork man pages, this call could fail due to the following errors:

http://linux.die.net/man/2/fork


  1. EAGAIN
fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.

2. EAGAIN It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.

3. ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.


Check your memory consumption and see if there is enough memory available.

You also indicated there are 32K processes were running on the server when the issue was happening. Have you checked for any defunct/zombie processes?

Aside from what I’ve indicated above, you may also be hitting some of the known 11.2 bugs, such as 8841501, 9356344, 9398412, 9944177, 9234660, 9855476 (Check MOS Note# 1062676.1)

Although CPU might not be a problem for this particular case, running 4K processes concurrently may also cause heavy CPU utilization. How many CPUs(cores) each node has? Knowing it is RAC environment, if CPU is 100% utilized (unless you use resource manager) you may also experience the heavy utilized node to be evicted. Has this happened? --- Maybe you should think of using some connection pooling mechanism, or if you already use it to check if it is used appropriately/efficiently. Stephane’s comment about shared servers is also valid.

Hope this helps.

~Mihajlo

> On 09/22/2011 06:12 AM, Upendra N wrote:
> > yeah. This is a very heavily used app/db, we see 4000 co...
>
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Sep 22 2011 - 09:49:07 CDT

Original text of this message