Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: 100% CPU

RE: 100% CPU

From: Gaja Krishna Vaidyanatha <oraperfman_at_yahoo.com>
Date: Tue, 20 May 2003 17:21:37 -0800
Message-ID: <F001.0059E77C.20030520172137@fatcity.com>


List,

I have personally dealt with this issue a few times at customer sites, so I thought I'd share with you my experiences. At any rate, this is a thread that is hard to resist...;-) My experiences on "real-life" customer systems truly changed my perspective on the issue.

My observations:



Assuming that the maximum CPU capacity for a given system has been estimated for peak load times, then having a system with 100% CPU Utilization during peak times is OK. The thing to watch out for is that %SYS
(the OS overhead) & %WIO(the percentage Wait-for-IO)
components, available in a "sar -u" output, are within reasonable limits and more importantly the "average run queue" for CPU, available as the first column "r", in a vmstat output, is within 2X(# of CPUs).

Note - The %SYS an %WIO components on a
well-performing systems have been observed to usually be within 10-15%. Use these numbers as guidelines NOT a draconian rules.

What to look for?



The check for %SYS is to determine whether the OS itself is taking up too much CPU resources to do its job and the %WIO is to determine whether there are many process waiting for I/O service calls to be processed. High values in both are not desirable. %SYS tends to be high, when the OS performs too many context switches or say when the paging daemon has gone crazy. %WIO tends to be high, when processes are bottlenecked on the storage device.

The run-queue check is the most reliable method that I know of to determine CPU bottlenecks as it signifies the number of processes awaiting the CPU service in the queue. When this number is consistently above 2X(# of CPUs), you can assume that CPU starvation is beginning to happen.

I have monitored and observed large systems where the %IDLE was 0% all day (during peak times). Jobs were getting done and people were performing their tasks at required levels. Nobody was complaining of performance problems. Granted, no significant additional load on the system could be posed, but then again, the system was designed for "peak loads".

I personally think that the notion of having "spare CPU capacity" is something like a "warm fuzzy". It gives you a good feel, but there is usually no technical demerit or performance issue with a 0% IDLE system, provided the above conditions are met. There is really no ideal number for %IDLE, or let's say I have yet to find a reasonable number that makes sense...oh take that back....42...;-) (tongue in cheek)

What causes high CPU usage?



The focus on such systems where the CPU numbers are high, should be geared towards - "Who is consuming all of these resources?". After all performance management should be about eliminating wasteful usage of resources.

If you find that a handful of Oracle processes are consuming large amounts of CPU, then you know who the resource hoggers are. Use "top" to get the process ids, and then you can trace these processes within Oracle. Find the SQL that they run and fix the "core problem" - LOGICAL I/O. If you can re-write the SQL, more power to you. If all processes are consuming approximately an equal amount of CPU, chances are that, there is no single resource hogger. The problem is a bit-harder to detect. Use 10046 discretly to sample a few Oracle processes and determine their usage pattern and SQL execution.

Over-sized SGAs usually push systems into "memory starvation", these have been culprits of excess CPU consumption in my experiences, as in those cases the paging daemon had been completely over-worked. When %SYS is 45%, you know the OS has gone beserk.

Things to do?



Bottom line - Reduce parsing (hard and soft) when possible, reduce logical I/O, don't over-size your SGAs, set large OPTIMAL sizes for rollback segments
(prior to 9i) so that extent de-allocation and
allocation is not done frequently, use locally-managed tablespaces or in prior-8i releases use tablespaces with equal-sized extents. All of these will help in reducing CPU consumption on your box.

In the end, when you reduce consumption, you will gain capacity back and make your environment more scalable.

How do I estimate CPU for my box?



The estimation of CPU capacity can be done using many statistical and numerical methods. The cheapest and most common method used is "The Ratio Modeling Technique". Craig Shallahamer et. al. wrote a paper on it, check out http://www.orapub.com. It is low-precision method for calculating CPU system capacity, but I'd take "low-precision" to "no-precision", which usually is worse.

Hope this helps,

Gaja

<<lots of stuff deleted...sorry>>




Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Gaja Krishna Vaidyanatha
  INET: oraperfman_at_yahoo.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L

(or the name of mailing list you want to be removed from). You may
also send the HELP command for other information (like subscribing).
Received on Tue May 20 2003 - 20:21:37 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US