Re: Responsiveness of Server at high CPU load

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: 18 Dec 2003 00:40:19 -0800
Message-ID: <1a75df45.0312180040.217d57f5@posting.google.com>

Rick Denoire <100.17706_at_germanynet.de> wrote

> So I am asked to DO SOMETHING to
> solve the problem immediately. In this situation, I have no chance to
> investigate anything.

That's a stupid request they gave you Rick. Do they expect you to wave a magic wand and solve the problem without identifying the actual problem and then determining how to address it?

> Last time, though, I noticed that two instances were active. I could
> hardly connect to any of them. My mistake was not to try the quiescent
> instances too... So I can't tell now whether ALL instances were
> suffering the same problem.

I strongly favour a single instance per platform. Multiple instances... well, there are a lot more cons than pros.

If both instances are being hammered, they could cause contention at kernel level when it is configured with insufficient resources. What are your kernel.shmmax and kernel.sem parameters set to?

> But definitely, at the OS level, one could still more or less work
> fluently.

As expected. Which then eliminates something like a kernel sheduling bug.

> >Are you using MTS or dedicated server or a mix?
>
> Dedicated server only.

Check PGA usage. A single Oracle dedicated process can play havoc with memory and disk i/o. Also check how many recursive SQLs are being generated. From what you described thus far, it seems pretty much familiar to what I've seen a run-away production job with Bad SQL (tm) does to a HP-UX box. Oracle crawls to almost a standstill with the box showing very high CPU, memory and disk utilisation.

> >Lastly, if the existing Oracle sessions are still responding, then
> >make sure that you establish a sysdba connection up front. Use that
> >session to pop the hood and dig around the v$ tables (especially
> >events and waits) to see what Oracle is doing and what it it waiting
> >for during the time that no new Oracle connections are accepted.
>
> There is no way to investigate anything, the system just won't answer.

Define what the contractors are running. Take those SQLs and put them through explain plan. Test drive those SQLs on smaller data sets (adding rownum stop criteria can often do the trick here).

From what you describe, my experience has been that the actual problem is seldom with the Oracle instance configuration itself. Or even the operating system.

Most times it is simply exceedingly bad SQLs being run. And Oracle performing awesomely fast trying to do what those SQLs ask.

In one case on a 80GB production database, I've seen bad SQLs (from a single production process) being thrown at Oracle causing it to physically read and process over 1.3 Terrabytes of data in under 40 hours. With about 10 million transactions (commits) being done during that time period. And this on an old K-class HP box.

Oracle can do the job performance wise. How well that performance capability is put to use is the biggest problem I encounter time and again. And that is what I suspect you are running into.

> >Oh yeah - is there something equivalent to truss on Linux? That can be
> >a very useful tool at times like this.
>
> I tried pstack <Oracle_process_id> and waited, waited.... until I gave
> up.

pstack is not it. truss is more like an interactive ptrace of system calls.

--
Billy

Received on Thu Dec 18 2003 - 02:40:19 CST