Re: Beaking down SQL performance by wait time

From: <fmhabash_at_gmail.com>
Date: Fri, 23 Feb 2007 12:53:13 -0500
Message-ID: <45DF2A09.4060508@gmail.com>

SMenue, interesting, indeed.

The objective of any performance tuning method is to be able to tell why did a particular task experience a delay in an unacceptable manner. This is regardless of how and what tools you use to diagnose it. In an ideal world, as a DBA you're given in advance all details about this task and with the help from a KNOWLEDGEABLE application person you funnel it down to a particular SQL. You trace it and most likely you have a solution. However, this does not happen in real world because of ... 1- Normally, there is not this KNOWLEDGEABLE application person available to you unless you have an outside consultant to do the work. 2- When performance degradation strikes, the last thing you can do is stop the rush to ask, 'can I get some details please?'. It just won't happen.

So, what can you do when you get a call from management/users complaining there is general system performance issues. The task of isolating the issue is not difficult or complex by nature. It is just that the data Oracle offers you as a DBA is NOT conclusive only because it is all aggregates whether at session or DB level. So you hear these performance gurus hide behind terms like 'experience', 'advanced knowledge' to justify or explain it. The fact of the matter is that the issue is an inherent deficiency in the data and MOST of the tools including the statspack. Here are some real-life examples ...

1- user waits 50 secs for a transaction that normally takes 3 secs to finish because of an enqueue wait. How can you ever dig this out from a statspack report when the database has a 4000 exec/sec. 2- The whole idea of statspack is you look at your top timed events, then in some magic way correlate it to your top SQL, then go back to your users and say, this is why you were slow. This is guess work at best as far I'm concerned. How can this translate to real end-user experience.
3- application person drops an index by mistake and keeps quiet about it. You get a call from end user complaining that searching for an order now is taking such a long time. The SQL is now doing a FTS due to the missing index. You look at the SP report, the DBScR is not showing at all and the SQL itself is not listed anywhere. 4- user workstation gets hit by a virus and kills its Ethernet bandwidth. His code waits the most on SQL*net related wait events. These events showed no trace in the sp report, the SQL itself is not there, and worst of all, these waits are supposedly 'idle waits'.

Having realized that aggregation is the only way available, the very best I can truly accept is at least to aggregate my wait events at the hash value over total executions. if some how, I can look at what SQL ran in a given time frame and be able to profile each hash in terms of wait events, then I can be comfortable making some diagnosis. If I see hash 5959959 spent 50% on DBSR, 10% on CPU, and 30% direct writes, and 10% SQL*net related, I can certainly figure out what's wrong. I know of no tools that does this (outside of tkprof having to turn SQL trace system wide).

I truly believe unless we account for resource usage and wait time at the most basic functional unit of a database i.e. SQL, performance tuning will continue to be art more so than science. It will be simply be a matter of the DBA keeping track of every performance degradation event he/she ever witnessed and use this experience to solve future ones. It need not be this way.

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Feb 23 2007 - 11:53:13 CST