Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Responsiveness of Server at high CPU load

Re: Responsiveness of Server at high CPU load

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: 16 Dec 2003 22:17:50 -0800
Message-ID: <1a75df45.0312162217.1c71cd32@posting.google.com>


Rick Denoire <100.17706_at_germanynet.de> wrote

> From time to time, our Oracle test server (9.2.0.4 on Intel/Linux, 2
> CPUs) got unusuable at CPU load of 99% as shown by top; in this state,
> nothing else could be done with Oracle, even trying to connect via
> sqlplus took about 1 hour (assuming one would wait that long).
<snipped>

Interesting.

How about the other Oracle sessions that are connected? Are they responding?

Are you using MTS or dedicated server or a mix?

If a process is consuming CPU resources to such an extent that the system stops responding, then it is a kernel issue IMO. Unless of course the app ups its process priority to something like real-time. But then Oracle does not muck about with process priorities - Oracle simple forks/threads with default process priority.

If it is indeed a kernel issue (aka bug), then the *complete* system will be unresponsive - i.e. no telnet /ftp connections accepted, existing telnet session commands and response very slow, etc.

Is this the case? If so, then IMO you have run into a Linux kernel bug or something along those lines.

If not, then it means that this is not a 100% CPU issue, but something else. The high CPU utilisation is thus not the primary cause, but more likely the thing that makes the problem stands out very clearly. Maybe there's another resource that gets exhausted at 100% CPU utilisation. Or at that CPU level a certain blocking call at kernel level times out. Or signals/messages are lost causing async calls to timeout, thinking that the async op never completed.

I have seen various flavours of these errors on various flavours of Unix kernels. The last one (async calls never receiving notification of op completion) was particularly a nasty one we ran into on the ReliantUnix kernel some years ago.

Lastly, if the existing Oracle sessions are still responding, then make sure that you establish a sysdba connection up front. Use that session to pop the hood and dig around the v$ tables (especially events and waits) to see what Oracle is doing and what it it waiting for during the time that no new Oracle connections are accepted.

Oh yeah - is there something equivalent to truss on Linux? That can be a very useful tool at times like this.

--
Billy
Received on Wed Dec 17 2003 - 00:17:50 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US