Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Capacity Planner from OEM VS Statspack

RE: Capacity Planner from OEM VS Statspack

From: Cary Millsap <cary.millsap_at_hotsos.com>
Date: Thu, 5 Feb 2004 22:08:01 -0600
Message-ID: <003e01c3ec66$d39df670$6a01a8c0@CVMLAP02>


I agree that looking backward and explaining aggregated data after comprehending the details is valuable.

The problem with aggregated data--THE problem--is that a single effect can be explained by many, many possible causes. Determining the correct cause sometimes looks really easy when it's not. Determining the cause requires either (1) that you acquire more data (e.g., the detailed data that I argue you should get *first*), or (2) that you guess.

Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
* Nullius in verba *

Upcoming events:

- Performance Diagnosis 101: 2/24 San Diego, 3/23 Park City, 4/6 Seattle
- SQL Optimization 101: 2/16 Dallas
- Hotsos Symposium 2004: March 7-10 Dallas
- Visit www.hotsos.com for schedule details...


-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of John Kanagaraj Sent: Thursday, February 05, 2004 7:21 PM To: 'oracle-l_at_freelists.org'
Subject: RE: Capacity Planner from OEM VS Statspack

Dennis (and Babette),

Mladen pointed this out, and I would like to add to this. V$SESSION_WAIT is
a great tool to see what's going on _now_ when you have a _system-wide_ issue (as in Dennis's case). I found the following SQL very useful and provided the starting point for a great many troubleshooting episodes:

Select event, count(*)
>From v$session_wait

Group by event;

The other *gem* is the 'waiters' script from Steve Adams. Both of them will
specifically point out latching and file i/o issues _system-wide_. For eg, I
have seen 'near-hangs' (or Database freezes) when the lib cache latch is taken out by a large number of processes, or the shared pool latch is held
by a single process for quite a while. [I had a situation where a cron script woke up every 4 hours to grant select privileges to a read-only account for all the 10,000+ tables in the Db - the stress on the lib cache
was just way too much, especially at times of high activity, and these views, along with other v$ views correctly showed up this problem). The CPUs
were spinning on lib cache latches, and this made the SA sit up and notice
as well [and lent credibility to my arguments].

If you have a _specific_ program issue, (as with Babette), then a 10046 is
an ideal tool for 'targetting' the right session for the right scope of time
and actions [And Cary does a great job of explaining this in his book].

The issue that I have with STATSPACK in this particular situation is this:
It collects way too much information (which during a 'near-hang' situation
can be slow and add unnecessary overehad, and reports it all in a form that
can at times be confusing (which hides real clues which might have otherwise
been readily apparent). Granted, it does collect snapshots of important V$
information, but you need to know what to look for...

OTOH, if we could just take the good parts and pare STATSPACK down, the collected (compact) data might still be a useful tool (assuming you know what to look for and have configured it correctly). Deviations from baseline
should trigger off questions to your Business Analysts who can then trace
down the specific function/user/program on which you can apply the 10046 gun
so very well described by Cary.

[I think] The point this list has collectively tried to make in this long-running thread is this:

You cannot interpolate the details from a summary. (as in - obtaining 10046-based, time and targetted action scoped, trace details is not possible
from STATSPACK). At the same time, STATSPACK does have its place in baselining, capacity planning and showing up Database/Instance level issues
(if you know how to read it).

Quoting from the most famous personality in Luke 11:42 "You should have practiced the latter without leaving the former undone", I would suggest that we should use targetted 10046 tracing without ignoring the 'after-the-fact', historical summary information that STATSPACK can provide.
[In other words, every dog has its day!]

John Kanagaraj
DB Soft Inc
Phone: 408-970-7002 (W)

Grace - Getting something we do NOT deserve Mercy - NOT getting something we DO deserve Click on 'http://www.needhim.org' for Grace and Mercy that is freely available!

>-----Original Message-----
>From: DENNIS WILLIAMS [mailto:DWILLIAMS_at_LIFETOUCH.COM]
>Sent: Thursday, February 05, 2004 12:54 PM
>To: 'oracle-l_at_freelists.org'
>Subject: RE: Capacity Planner from OEM VS Statspack
>
>
>I agree with Babette. I have had some situations where the
>database "hung".
>If you are the only DBA, things get hairy fast. Phone ringing
>off the hook,
>people standing around your desk asking questions like "should
>we bounce
>Oracle?". Metalink has a nice note on hanging, by the way. You
>don't know
>how long the situation will continue. Will it clear up by
>itself in a few
>minutes, or should you just bounce the database so everyone
>can get back to
>work? Usually you can't diagnose the problem on the spot. I
>found GUI tools
>poor in this situation because the GUI itself gets really slow and now
>you're ignoring your bosses to peer into your screen.
> Yes, if the problem recurs often enough, a 10046 trace may
>be great. But
>your goal is to diagnose the problem without having the
>problem recur. All
>your system queries are returning really slowly so figuring out which
>process is at fault (if it is only one) takes an excruciating
>amount of time
>under pressure because the queries return realllllly slooooly.
> I found STATSPACK great in the database hanging situation.
>You just take
>a snapshot, calm the people for 5 minutes, then take another
>snapshot. When
>people ask whether the database should be bounced, you can
>reply that you
>want to take another snapshot in 5 minutes and if the situation hasn't
>cleared by then, we'll bounce it. By default STATSPACK will
>collect enough
>information to allow you to drill into the situation and diagnose the
>problem.
> Nothing irritates management like saying "I don't know why
>the situation
>occurred, I couldn't collect enough information to diagnose
>the problem and
>prevent it from recurring".
> Oh, one correction for Babette -- I would say 4 things: CPU, disk,
>network, and memory.
>
>Dennis Williams
>DBA
>Lifetouch, Inc.
>dwilliams_at_lifetouch.com
>
>-----Original Message-----
>From: Jared.Still_at_radisys.com [mailto:Jared.Still_at_radisys.com]
>Sent: Thursday, February 05, 2004 2:33 PM
>To: oracle-l_at_freelists.org
>Subject: RE: Capacity Planner from OEM VS Statspack
>
>
>Babette,
>Have you done a 10046 trace on this?
>
>Jared
>
>
>
>
>
>
><babette.turnerunderwood_at_hrdc-drhc.gc.ca>
>Sent by: oracle-l-bounce_at_freelists.org
> 02/05/2004 11:47 AM
> Please respond to oracle-l
>
>
> To: <oracle-l_at_freelists.org>
> cc:
> Subject: RE: Capacity Planner from OEM VS Statspack
>
>
>It is worse than that .....
>
>EVERYONE has noticed that at times the performance is abysmally slow.
>BUT according to all of the mainframe reporting information=20
>Everything is fine... No swapping, no paging, no disk bottleneck, no =
>memory problems, no CPU problems.....
>
>For instance, full tablescan (no indexes) to update a NULL column to =
>NULL
>on 1 Million rows. Can take up to three times as long at times.
>BUT System people insist there is nothing being pushed at the system =
>level.
>The CPU is not maxed out, the disks have no bottlenecks or contention =
>and there are no memory problems at this time.
>
>There are only three things, CPU, DISK, Network.
>There has to be something wrong with at least one of them to
>be getting =
>the weird sporadic performance that we get.
>
>It is hard to get overall picture of health of machine in MF =
>environment.
>We have Logical machines (LPARs) on a single physical box.
>I know that I/O on other LPARs can affect our I/O, but we are
>told that =
>there are no problems according to the system records.
>
>The statspack information is a bonus. We have SOMETHING that
>we can say =
>"explain this". Still waiting on explanation for a few weeks now...
>
>Babette
>
>-----Original Message-----
>From: oracle-l-bounce_at_freelists.org =
>[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Niall Litchfield
>Sent: 2004-02-03 3:21 PM
>To: oracle-l_at_freelists.org
>Subject: RE: Capacity Planner from OEM VS Statspack
>
>
>> I agree with Ian.... Sometimes Statspack is VERY useful..
>>=3D20
>> In our case the Statspack reports shows ave read times of=3D20
>> 1-10ms. However we occasionally see read times of 300-700 ms.
>>=3D20
>> We are currently investigating what is on the slower disks,=3D20
>> What systems are sharing them, and whether oracle is=3D3D20=3D20
>> chaining I/O requests and giving false stats or if there=3D20
>> really is a =3D3D problem. (Hey, on OS/390 mainframe system we=3D20
>> don't get iostat / sar / vmstat / =3D3D
>> top)
>>=3D20
>> This top-down approach doesn't address any SPECIFIC=3D20
>> performance proble. BUT ... if we didn't have Statspack=3D20
>> running periodically, we might have =3D3D missed this.
>>=3D20
>> - Babette
>
>I think the interesting question here is 'If you had missed this, would
>anyone care?' and its corollary 'now you have caught it, does
>anyone =3D
>care?'.
>Now I admit that I have a biased view in that all anyone ever seems to
>complain to me about is 'Screen X is running slow' or 'we
>can't complete =
>=3D
>our
>management reports overnight' or 'I'm not a dba so your presentation on
>managing databases that I chose to attend was irrelevant' -
>oops sorry =
>=3D
>not
>that last one. Almost never does anyone whinge that 'the system is =3D
>slow', or
>at least when they do they have a specific example in mind. As
>a result =
>=3D
>I am
>definitely biased towards a view that systems don't experience
>problems =
>=3D
>-
>processes do. I *suspect* that even where the *system* is slow then =3D
>actually
>it will be fewer than 5 processes that are killing it, but have no =3D
>proof.=3D20
>
>Niall
>
>----------------------------------------------------------------
>Please see the official ORACLE-L FAQ: http://www.orafaq.com
>----------------------------------------------------------------
>To unsubscribe send email to: oracle-l-request_at_freelists.org
>put 'unsubscribe' in the subject line.
>--
>Archives are at http://www.freelists.org/archives/oracle-l/
>FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
>-----------------------------------------------------------------
>----------------------------------------------------------------
>Please see the official ORACLE-L FAQ: http://www.orafaq.com
>----------------------------------------------------------------
>To unsubscribe send email to: oracle-l-request_at_freelists.org
>put 'unsubscribe' in the subject line.
>--
>Archives are at http://www.freelists.org/archives/oracle-l/
>FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
>-----------------------------------------------------------------
>
>
>
>
>----------------------------------------------------------------
>Please see the official ORACLE-L FAQ: http://www.orafaq.com
>----------------------------------------------------------------
>To unsubscribe send email to: oracle-l-request_at_freelists.org
>put 'unsubscribe' in the subject line.
>--
>Archives are at http://www.freelists.org/archives/oracle-l/
>FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
>-----------------------------------------------------------------
>----------------------------------------------------------------
>Please see the official ORACLE-L FAQ: http://www.orafaq.com
>----------------------------------------------------------------
>To unsubscribe send email to: oracle-l-request_at_freelists.org
>put 'unsubscribe' in the subject line.
>--
>Archives are at http://www.freelists.org/archives/oracle-l/
>FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
>-----------------------------------------------------------------
>



Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request_at_freelists.org
put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
Received on Thu Feb 05 2004 - 22:08:01 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US