Return-Path: <oracle-l-bounce@freelists.org>
Delivered-To: 2-oracle-l@orafaq.com
Received: (qmail 29233 invoked from network); 6 Jun 2006 18:50:22 -0500
Received: from freelists-180.iquest.net (HELO turing.freelists.org) (206.53.239.180)
  by 69.64.49.119 with SMTP; 6 Jun 2006 18:50:13 -0500
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 2898534DC65;
 Tue,  6 Jun 2006 19:50:10 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 31398-06; Tue, 6 Jun 2006 19:50:10 -0400 (EDT)
Received: from turing (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9426F34D013;
 Tue,  6 Jun 2006 19:50:09 -0400 (EDT)
Received: with ECARTIS (v1.0.0; list oracle-l); Tue, 06 Jun 2006 19:49:26 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id D7C1C34D358
 for <oracle-l@freelists.org>; Tue,  6 Jun 2006 19:49:25 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 31252-09 for <oracle-l@freelists.org>;
 Tue, 6 Jun 2006 19:49:25 -0400 (EDT)
Received: from outbound2-red-R.bigfish.com (outbound-red.frontbridge.com [216.148.222.49])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 3C5F834D6B5
 for <oracle-l@freelists.org>; Tue,  6 Jun 2006 19:49:23 -0400 (EDT)
Received: from outbound2-red.bigfish.com (localhost.localdomain [127.0.0.1])
 by outbound2-red-R.bigfish.com (Postfix) with ESMTP id 4EB99BA88B5
 for <oracle-l@freelists.org>; Tue,  6 Jun 2006 23:49:21 +0000 (UTC)
Received: from mail71-red-R.bigfish.com (unknown [172.18.12.3])
 by outbound2-red.bigfish.com (Postfix) with ESMTP id 42B68BA86B6
 for <oracle-l@freelists.org>; Tue,  6 Jun 2006 23:49:21 +0000 (UTC)
Received: from mail71-red.bigfish.com (localhost.localdomain [127.0.0.1])
 by mail71-red-R.bigfish.com (Postfix) with ESMTP id 3B98C4EBA9B
 for <oracle-l@freelists.org>; Tue,  6 Jun 2006 23:49:21 +0000 (UTC)
X-BigFish: V
Received: by mail71-red (MessageSwitch) id 1149637761126802_15385; Tue,  6 Jun 2006 23:49:21 +0000 (UCT)
Received: from spamsmtp1.weyerhaeuser.com (unknown [208.247.148.35])
 by mail71-red.bigfish.com (Postfix) with ESMTP id E602C4EB447
 for <oracle-l@freelists.org>; Tue,  6 Jun 2006 23:49:20 +0000 (UTC)
Received: from wafedixsmtp11.corp.weyer.pri ([10.31.65.96]) by spamsmtp1.weyerhaeuser.com with Microsoft SMTPSVC(5.0.2195.5329);
	 Tue, 6 Jun 2006 16:42:50 -0700
Received: from wafedixm10.corp.weyer.pri ([10.82.241.163]) by wafedixsmtp11.corp.weyer.pri with Microsoft SMTPSVC(6.0.3790.211);
	 Tue, 6 Jun 2006 16:49:18 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C689C3.D2C4892B"
Subject: RE: Statspack ratios help
Date: Tue, 6 Jun 2006 16:49:17 -0700
Message-ID: <65C0D8935651CB4D96E97CEFAC5A12B9022F1404@wafedixm10.corp.weyer.pri>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Statspack ratios help
Thread-Index: AcaJoPC2rWO9mSIlTC+4fTA0GhwhcQAAA9ZAAAg3vNA=
From: "Khemmanivanh, Somckit" <somckit.khemmanivanh@weyerhaeuser.com>
To: <Mark.Bobak@il.proquest.com>,
 <bunjibry@gmail.com>,
 "Oracle-L" <oracle-l@freelists.org>
X-OriginalArrivalTime: 06 Jun 2006 23:49:18.0518 (UTC) FILETIME=[D3699D60:01C689C3]
X-archive-position: 35682
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: somckit.khemmanivanh@weyerhaeuser.com
Precedence: normal
Reply-to: somckit.khemmanivanh@weyerhaeuser.com
X-list: oracle-l
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at avenirtech.net
------_=_NextPart_001_01C689C3.D2C4892B
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

>>>The thing to remember is that a hit ratio does not reliably correlate
to performance, and therefore is NOT a valid performance metric.  If you
approach a database that is having performance problems, look at a hit
ratio, and determine it's "too low", you're likely to spin your wheels
trying to "fix" the hit ratio by improving it, but miss the root cause
of the performance problem.<<<
=20

I'll have to disagree with that somewhat. Consider that case of index
scans that are serviced mainly from the DB Buffer Cache.=20

Your hit ratio will be good because the query(ies) are continually
hitting the cache and not reading from disk.

Once you start getting cache misses then your performance drops as
well....

I know what most of the response will be -- go tune the code to be more
efficient...of course that is the ideal solution but in reality that is
not always a possilbility...
=20
So you are left with things like trying to buffer as much of the data as
possible (i.e. big DB cache sizes, keep pools, etc..)
=20
I'm not saying you don't have a valid point (most of it is) but there
are always exceptions...

________________________________

From: oracle-l-bounce@freelists.org
[mailto:oracle-l-bounce@freelists.org] On Behalf Of Bobak, Mark
Sent: Tuesday, June 06, 2006 1:08 PM
To: bunjibry@gmail.com; Oracle-L
Subject: RE: Statspack ratios help


Bryan,
=20
Never be afraid to ask a question...and there are no stupid
questions...only stupid answers.  I shall endeavor to ensure this is not
a stupid answer.
=20
There is nothing inherently wrong with hit ratios.  The problem is when
people rely on hit ratios as a valid performance metric.  If you want to
monitor hit ratios and you happen to notice a precipitous drop in a
particular hit ratio, well, perhaps there's something there....perhaps.
=20
The thing to remember is that a hit ratio does not reliably correlate to
performance, and therefore is NOT a valid performance metric.  If you
approach a database that is having performance problems, look at a hit
ratio, and determine it's "too low", you're likely to spin your wheels
trying to "fix" the hit ratio by improving it, but miss the root cause
of the performance problem.
=20
A few words about 100% buffer cache hit ratio.  This is actually often a
sign of trouble, perhaps big trouble.  The theory goes like this "disk
I/O is X times slower than memory access (pick your favorite number for
X, 100, 1000, 10000, whatever) so therefore, you want all your I/O to be
logical (from the buffer cache) thereby eliminating all that slow disk
I/O."  Sounds good, right?  There are some things you have to
remember.....Oracle's buffer cache is part of the SGA, which is a shared
memory segment, which all the Oracle server processes share access to.
In order to ensure the integrity of that data in shared memory, Oracle
has some pretty elaborate locking mechanisms.  I won't go into details,
but, very breifly (and incompletely) a "latch" is a small chunk of
memory which serializes access to a specific part of the SGA.  The
buffer cache is protected by latches and buffer locks that make sure
that while one process is reading data from a particular block, another
process can't overwrite that buffer.
=20
So, when you consider the serialization overhead of Oracle's SGA, the
difference in cost between a logical I/O and a physical I/O is nowhere
near 10000 or 1000 or even 100.  If I recall correctly, Cary Millsap
published a number, based on huge amounts of empirical data (trace
files) that calculated it to be something like a factor of 37.  Ok,
that's still 37x slower!  That's worth eliminating, right??
=20
Ok, next, a little change in focus here.  There is no problem if a
particular well-tuned query has a hit ratio of 100%.  In fact, most
anyone would agree that this is a good thing.  So, what's the problem
with a 100% (or nearly 100%) buffer cache hit ratio instance wide??
Well, maybe nothing....maybe.  But it's also very possible to have an
instance with a 100% buffer cache hit ratio that is so bottlenecked that
noone can get any use out of it.  The problem lies in the serialization
I mentioned before.  Even if you have a 100% buffer cache hit ratio,
every buffer cache access is serialized through latches, buffer locks,
and buffer pins.  This is necessary for Oracle to maintain data
integrity.  So, what happens if you have a query that (for example) uses
an index very inefficiently?  You have a query that hits the same blocks
over and over again.  Because they are repeatedly hit, they remain in
the buffer cache, so, no disk I/O!  Yay!  But, the serialization is
killing you.  What happens when you run several instances of these
inefficient queries concurrently?  They will push your buffer cache hit
ratio well past 99.99%, but, the system will be suffering.  The
serialization invlolved in locking and latching will bring the system to
it's knees.  So, in this case, you have the "ideal" BCHR, but your
system will run like a dog.
=20
The most effective way to determine the root cause of a performance
problem is to specifically analyze the process that's having the
problem.  Look at the performance profile.  Where is time actually being
spent?  Focus on that!  Tools like the 10046 trace at level 8, the
Oracle wait interface, and StatsPack, can be very useful in determining
the root cause.
=20
-Mark
=20
PS  If you go to http://www.hotsos.com/ and click on Library, Cary
Millsap has a paper there called something like "Why a 99+% buffer cache
hit ratio is bad", and he makes a much more eloquent argument than I do
above.  Lots of other good papers there too....take some time to read
through some of them.  Registration is required, but free.
=20

--=20
Mark J. Bobak=20
Senior Oracle Architect=20
ProQuest Information & Learning=20

For a successful technology, reality must take precedence over public
relations, for Nature cannot be fooled.  --Richard P. Feynman, 1918-1988

.com <mailto:bunjibry@gmail.com> =20

------_=_NextPart_001_01C689C3.D2C4892B
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2873" name=3DGENERATOR></HEAD>
<BODY>
<DIV dir=3Dltr align=3Dleft>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial><FONT =
color=3D#0000ff><FONT=20
size=3D2><SPAN class=3D966183523-06062006>&gt;&gt;&gt;</SPAN>The thing =
to remember=20
is that a hit ratio does not reliably correlate to performance, and =
therefore is=20
NOT a valid performance metric.&nbsp; If you approach a database that is =
having=20
performance problems, look at a hit ratio, and determine it's "too low", =
you're=20
likely to spin your wheels trying to "fix" the hit ratio by improving =
it, but=20
miss the root cause of the performance problem.<SPAN=20
class=3D966183523-06062006>&lt;&lt;&lt;</SPAN></FONT></FONT></FONT></SPAN=
></DIV></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV><!-- =
Converted from text/plain format -->
<P><SPAN class=3D966183523-06062006><FONT size=3D2>I'll have to disagree =
with that=20
somewhat. Consider that case of index scans that are serviced mainly =
from the DB=20
Buffer Cache. </FONT></SPAN></P>
<P><SPAN class=3D966183523-06062006><FONT size=3D2>Your hit ratio will =
be good=20
because the query(ies) are continually hitting the cache and not reading =
from=20
disk.</FONT></SPAN></P>
<P><SPAN class=3D966183523-06062006><FONT size=3D2>Once you start =
getting cache=20
misses then your performance drops as well....</FONT></SPAN></P>
<DIV><SPAN class=3D966183523-06062006><FONT face=3DArial size=3D2>I know =
what most of=20
the response will be -- go tune the code to be more efficient...of =
course that=20
is the ideal solution but in reality that is not always a=20
possilbility...</FONT></SPAN></DIV>
<DIV><SPAN class=3D966183523-06062006><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D966183523-06062006><FONT face=3DArial size=3D2>So you =
are left with=20
things like trying to buffer as much of the data as possible (i.e. big =
DB cache=20
sizes,&nbsp;keep pools, etc..)</FONT></SPAN></DIV>
<DIV><SPAN class=3D966183523-06062006><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D966183523-06062006><FONT face=3DArial size=3D2>I'm =
not saying you=20
don't have a valid point (most of it is) but there are always=20
exceptions...</FONT></SPAN></DIV><BR>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> oracle-l-bounce@freelists.org=20
[mailto:oracle-l-bounce@freelists.org] <B>On Behalf Of </B>Bobak,=20
Mark<BR><B>Sent:</B> Tuesday, June 06, 2006 1:08 PM<BR><B>To:</B>=20
bunjibry@gmail.com; Oracle-L<BR><B>Subject:</B> RE: Statspack ratios=20
help<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2>Bryan,</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>Never=20
be afraid to ask a question...and there are no stupid questions...only =
stupid=20
answers.&nbsp; I shall endeavor to ensure this is not a stupid=20
answer.</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>There=20
is nothing inherently wrong with hit ratios.&nbsp; The problem is when =
people=20
rely on hit ratios as a valid performance metric.&nbsp; If you want to =
monitor=20
hit ratios and you happen to notice a precipitous drop in a particular =
hit=20
ratio, well, perhaps there's something =
there....perhaps.</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>The=20
thing to remember is that a hit ratio does not reliably correlate to=20
performance, and therefore is NOT a valid performance metric.&nbsp; If =
you=20
approach a database that is having performance problems, look at a hit =
ratio,=20
and determine it's "too low", you're likely to spin your wheels trying =
to "fix"=20
the hit ratio by improving it, but miss the root cause of the =
performance=20
problem.</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>A few=20
words about 100% buffer cache hit ratio.&nbsp; This is actually often a =
sign of=20
trouble, perhaps big trouble.&nbsp; The theory goes like this "disk I/O =
is X=20
times slower than memory access (pick your favorite number for X, 100, =
1000,=20
10000, whatever) so therefore, you want all your I/O to be logical (from =
the=20
buffer cache) thereby eliminating all that slow disk I/O."&nbsp; Sounds =
good,=20
right?&nbsp; There are some things you have to remember.....Oracle's =
buffer=20
cache is part of the SGA, which is a shared memory segment, which all =
the Oracle=20
server processes share access to.&nbsp; In order to ensure the integrity =
of that=20
data in shared memory, Oracle has some pretty elaborate locking=20
mechanisms.&nbsp; I won't go into details, but, very breifly (and =
incompletely)=20
a "latch" is a small chunk of memory which serializes access to a =
specific part=20
of the SGA.&nbsp; The buffer cache is protected by latches and buffer =
locks that=20
make sure that while one process is reading data from a particular =
block,=20
another process can't overwrite that buffer.</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>So,=20
when you consider the serialization overhead of Oracle's SGA, the =
difference in=20
cost between a logical I/O and a physical I/O is nowhere near 10000 or =
1000 or=20
even 100.&nbsp; If I recall correctly, Cary Millsap published a number, =
based on=20
huge amounts of empirical data (trace files) that calculated it to be =
something=20
like a factor of 37.&nbsp; Ok, that's still 37x slower!&nbsp; That's =
worth=20
eliminating, right??</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>Ok,=20
next, a little change in focus here.&nbsp; There is no problem if a =
particular=20
well-tuned query has a hit ratio of 100%.&nbsp; In fact, most anyone =
would agree=20
that this is a good thing.&nbsp; So, what's the problem with a 100% (or =
nearly=20
100%) buffer cache hit ratio instance wide??&nbsp; Well, maybe=20
nothing....maybe.&nbsp; But it's also very possible to have an instance =
with a=20
100% buffer cache hit ratio that is so bottlenecked that noone can get =
any use=20
out of it.&nbsp; The problem lies in the serialization I mentioned =
before.&nbsp;=20
Even if you have a 100% buffer cache hit ratio, every buffer cache =
access is=20
serialized through latches, buffer locks, and buffer pins.&nbsp; This is =

necessary for Oracle to maintain data integrity.&nbsp; So, what happens =
if you=20
have a query that (for example) uses an index very inefficiently?&nbsp; =
You have=20
a query that hits the same blocks over and over again.&nbsp; Because =
they are=20
repeatedly hit, they remain in the buffer cache, so, no disk I/O!&nbsp;=20
Yay!&nbsp; But, the serialization is killing you.&nbsp; What happens =
when you=20
run several instances of these inefficient queries concurrently?&nbsp; =
They will=20
push your buffer cache hit ratio well past 99.99%, but, the system will =
be=20
suffering.&nbsp; The serialization invlolved in locking and latching =
will bring=20
the system to it's knees.&nbsp; So, in this case, you have the "ideal" =
BCHR, but=20
your system will run like a dog.</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =
size=3D2>The=20
most effective way to determine the root cause of a performance problem =
is to=20
specifically analyze the process that's having the problem.&nbsp; Look =
at the=20
performance profile.&nbsp; Where is time actually being spent?&nbsp; =
Focus on=20
that!&nbsp; Tools like the 10046 trace at level 8, the Oracle wait =
interface,=20
and StatsPack, can be very useful in determining the root=20
cause.</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2>-Mark</FONT></SPAN></DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D046014019-06062006><FONT face=3DArial color=3D#0000ff =

size=3D2>PS&nbsp; If you go to <A=20
href=3D"http://www.hotsos.com/">http://www.hotsos.com/</A> and click on =
Library,=20
Cary Millsap has a paper there called something like "Why a 99+% buffer =
cache=20
hit ratio is bad", and he makes a much more eloquent argument than I do=20
above.&nbsp; Lots of other good papers there too....take some time to =
read=20
through some of them.&nbsp; Registration is required, but=20
free.</FONT></SPAN></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT>&nbsp;</DIV><!-- =
Converted from text/rtf format -->
<P><B><FONT face=3D"Century Gothic" size=3D2>--</FONT></B> <BR><B><FONT=20
face=3D"Century Gothic" size=3D2>Mark J. Bobak</FONT></B> <BR><B><FONT=20
face=3D"Century Gothic" size=3D2>Senior Oracle Architect</FONT></B> =
<BR><B><FONT=20
face=3D"Century Gothic" color=3D#000000 size=3D2>P</FONT><FONT =
face=3D"Century Gothic"=20
size=3D2>ro</FONT><FONT face=3D"Century Gothic" color=3D#ff0000 =
size=3D2>Q</FONT><FONT=20
face=3D"Century Gothic" size=3D2>uest Information &amp; =
Learning</FONT></B> </P>
<P><FONT face=3D"Courier New" size=3D2>For a successful technology, =
reality must=20
take precedence over public relations, for Nature cannot be =
fooled.&nbsp;=20
--Richard P. Feynman, 1918-1988</FONT></P><A=20
href=3D"mailto:bunjibry@gmail.com">.com</A> </BODY></HTML>

------_=_NextPart_001_01C689C3.D2C4892B--

--
http://www.freelists.org/webpage/oracle-l


