Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> c.d.o.server -> Re: [C.D.O.S][Long...] How a DBA should handle a load-average-spike?

Re: [C.D.O.S][Long...] How a DBA should handle a load-average-spike?

From: DA Morgan <>
Date: Sun, 17 Dec 2006 13:36:15 -0800
Message-ID: <>

Peter Smith [gjfc] wrote:
> People,
> I'm trying to write a simple document on how a DBA should handle
> a host which is experiencing a load-average-spike (and
> negatively impacting end-user response time).
> Do any of you have some common sense you would like to share?
> Here are some ideas which come to my mind:
> -Try to characterize the current load using:
> -the 'ps' command.
> -the Targets -- Hosts -- Performance Page in EM
> Load-type-1: Is the load-average-spike caused by a single process or a
> small
> number of processes which are 'hogging' the CPU?
> Load-type-2: Is the load-average-spike caused by Oracle background
> processes which are 'hogging' the CPU?
> Load-type-3: Is the load-average-spike caused by Oracle foreground
> processes which are 'hogging' the CPU?
> Load-type-4: Is the load-average-spike caused by an abnormally large
> number of
> processes which have recently been added to the process list?
> If we are dealing with Load-type-1, we need to trace the processes
> to a user or business-process which spawned the process(es).
> Then, we implement procedures to rein-in or disconnect this user.
> This is a difficult situation because risks of doing the "wrong thing"
> are high. You don't want to disconnect a user who is conducting a
> legitimate business-critical task.
> If we are dealing with Load-type-2, we need to drill down into the
> Oracle
> background processes to find out what is keeping it overly busy.
> EM can help you with this:
> Home -- Targets -- Databases -- TheBusyDatabase -- Active Sessions with
> High CPU.
> In this screen you can see some clues about what is driving each of the
> busy sessions.
> When you drill down into Oracle background processes, one of the first
> layers you
> encounter are sessions.
> Your goal is to gather enough information to pinpoint the
> application(s) which are driving these sessions. Then, you interact
> with an administrative interface on the application to search for
> end-users who might be responsible. Or you might then find
> information about the application's configuration or state which is
> abnormal and needs to be fixed. For example, sometimes public facing
> websites can be pushed hard by mal-ware Perl scripts operated by
> delinquents to generate high volumes of traffic (to crash the site).
> A poorly constructed application might filter (or magnify) some of
> this traffic to a host for processing.
> One level below each session are individual SQL statements. EM offers
> functionality to look at each SQL statement in a session. If you are
> confident that the load on the DB is legitimate, an obvious next place
> to look for clues are the SQL statements. Tuning SQL statements is a
> vast subject but EM will quickly help you find SQL statements which
> consume relatively large amounts of CPU. Also EM contains a 'SQL
> Tuning Advisor'. It is sophisticated enough to help with a wide
> variety of poorly constructed SQL statements or schemas. For example,
> it could tell you if a new index would speed up a query and if so
> how much that index would slow down a corresponding insert.
> If we are dealing with Load-type-3, we need to...[ complete later ].
> If we are dealing with Load-type-4, we need to search for
> characteristics
> of the process set which yield clues about the deluge.
> Some simple questions might be a good start:
> -Are most of the processes owned by one userid?
> -Are most of the processes linked to the same executable?
> -Do we have source code for the underlying executable?
> -When were most of the processes created?
> -Are they all consuming CPU cycles?
> -Are they connected to anything:
> -reading/writing files?
> -connected to the DB?
> -network connected to anything?
> The goal here is to find the application or person who started all
> these processes.
> Another goal is to find out if they are obstructed by anything and thus
> prevented
> from dying since they cannot finish their jobs.
> So, that is my initial take on the high-load-average-spike scenario.
> Do any of you have any comments which would be useful for the
> DBA who needs a general set of guidelines for dealing with this
> situation?
> ...Peter

Is this a version of Oracle with the Resource Management such as 9i or 10g? If so I would think the appropriate thing to do is refine one's policies to make it 'impossible' for it to happen again.

Installing the Grid Control and monitoring resources as well as using the ADDM should complete the picture.

Daniel A. Morgan
University of Washington
(replace x with u to respond)
Puget Sound Oracle Users Group
Received on Sun Dec 17 2006 - 15:36:15 CST

Original text of this message