RE: Statspack ratios help

From: ramick <ramick_at_dotster.com>
Date: Tue, 6 Jun 2006 13:38:41 -0700
Message-Id: <20060606203848.CBE0A34D9D2@turing.freelists.org>

While all of this is good information, I don’t necessarily see an indication of a problem…

How long and at what times does the snapshot include? What are the top 5 wait events reported? Are users complaining?
What was it like when it was "fast" or when it worked correctly?

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bobak, Mark
Sent: Tuesday, June 06, 2006 1:08 PM
To: bunjibry_at_gmail.com; Oracle-L
Subject: RE: Statspack ratios help

Bryan,

Never be afraid to ask a question...and there are no stupid questions...only stupid answers. I shall endeavor to ensure this is not a stupid answer.

There is nothing inherently wrong with hit ratios. The problem is when people rely on hit ratios as a valid performance metric. If you want to monitor hit ratios and you happen to notice a precipitous drop in a particular hit ratio, well, perhaps there's something there....perhaps.

The thing to remember is that a hit ratio does not reliably correlate to performance, and therefore is NOT a valid performance metric. If you approach a database that is having performance problems, look at a hit ratio, and determine it's "too low", you're likely to spin your wheels trying to "fix" the hit ratio by improving it, but miss the root cause of the performance problem.

A few words about 100% buffer cache hit ratio. This is actually often a sign of trouble, perhaps big trouble. The theory goes like this "disk I/O is X times slower than memory access (pick your favorite number for X, 100, 1000, 10000, whatever) so therefore, you want all your I/O to be logical (from the buffer cache) thereby eliminating all that slow disk I/O." Sounds good, right? There are some things you have to remember.....Oracle's buffer cache is part of the SGA, which is a shared memory segment, which all the Oracle server processes share access to. In order to ensure the integrity of that data in shared memory, Oracle has some pretty elaborate locking mechanisms. I won't go into details, but, very breifly (and incompletely) a "latch" is a small chunk of memory which serializes access to a specific part of the SGA. The buffer cache is protected by latches and buffer locks that make sure that while one process is reading data from a particular block, another process can't overwrite that buffer.

So, when you consider the serialization overhead of Oracle's SGA, the difference in cost between a logical I/O and a physical I/O is nowhere near 10000 or 1000 or even 100. If I recall correctly, Cary Millsap published a number, based on huge amounts of empirical data (trace files) that calculated it to be something like a factor of 37. Ok, that's still 37x slower! That's worth eliminating, right??

Ok, next, a little change in focus here. There is no problem if a particular well-tuned query has a hit ratio of 100%. In fact, most anyone would agree that this is a good thing. So, what's the problem with a 100% (or nearly 100%) buffer cache hit ratio instance wide?? Well, maybe nothing....maybe. But it's also very possible to have an instance with a 100% buffer cache hit ratio that is so bottlenecked that noone can get any use out of it. The problem lies in the serialization I mentioned before. Even if you have a 100% buffer cache hit ratio, every buffer cache access is serialized through latches, buffer locks, and buffer pins. This is necessary for Oracle to maintain data integrity. So, what happens if you have a query that (for example) uses an index very inefficiently? You have a query that hits the same blocks over and over again. Because they are repeatedly hit, they remain in the buffer cache, so, no disk I/O! Yay! But, the serialization is killing you. What happens when you run several instances of these inefficient queries concurrently? They will push your buffer cache hit ratio well past 99.99%, but, the system will be suffering. The serialization invlolved in locking and latching will bring the system to it's knees. So, in this case, you have the "ideal" BCHR, but your system will run like a dog.

The most effective way to determine the root cause of a performance problem is to specifically analyze the process that's having the problem. Look at the performance profile. Where is time actually being spent? Focus on that! Tools like the 10046 trace at level 8, the Oracle wait interface, and StatsPack, can be very useful in determining the root cause.

-Mark

PS If you go to http://www.hotsos.com/ and click on Library, Cary Millsap has a paper there called something like "Why a 99+% buffer cache hit ratio is bad", and he makes a much more eloquent argument than I do above. Lots of other good papers there too....take some time to read through some of them. Registration is required, but free.

-- 
Mark J. Bobak 
Senior Oracle Architect 
ProQuest Information & Learning 
For a successful technology, reality must take precedence over public
relations, for Nature cannot be fooled.  --Richard P. Feynman, 1918-1988


 


________________________________________
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
On Behalf Of Bryan Wells
Sent: Tuesday, June 06, 2006 3:39 PM
To: Oracle-L
Subject: Re: Statspack ratios help
As a Jr. wanting to be Sr. (someday) I have to ask the stupid questions:


 

why is Buffer % bad?  Dont you want 100% buffer cache hit vs. 100% disk i/o?



 

On 6/6/06, David Sharples <davidsharples_at_gmail.com> wrote: 
looks like you got some nasty sql in there (a buffer cache of 100% is not a
good thing, looks like you are doing much logical io)


 

Anyway, does web logic cache its statements for you, usually a parameter
withing the database connections properties



 

On 06/06/06, Sandeep Dubey <dubey.sandeep_at_gmail.com > wrote: 

Rollback per transaction %:   44.17
Buffer Nowait %:  100.00       Redo NoWait %:   99.97


           Buffer  Hit   %:   100.00    In-memory Sort %:  100.00

           Library Hit   %:  100.00        Soft Parse %:  100.00

        Execute to Parse %:    9.62         Latch Hit %:   99.88

Parse CPU to Parse Elapsd %:   69.06     % Non-Parse CPU:   91.60

With 100% soft parse, execute to parse ratio is so low. Is it bad, how
I can I improve it?

I see rollback per transaction as 44.17. We are using Hibernate that
generates database mapping and produces most of the SQLs. How can I 
invetigate further? But I doubt if application is doing some big time
rollbacks.



-- 
Bryan S Wells
Cell: 303.532.9879
Email: bunjibry_at_gmail.com 

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Jun 06 2006 - 15:38:41 CDT