Battle Against Any Guess [was "high cpu on query"]

From: Alex Gorbachev <ag_at_oracloid.com>
Date: Sun, 1 Jul 2007 03:40:29 -0400
Message-ID: <c2213f680707010040x2b1aa6cam30cddc4420c1cdde@mail.gmail.com>

<repost>I meant actually to change the subject line to avoid side-tracking Joe's thread but hit send too fast so here it is again...</repost>

Dear list,

It was very interesting to track the development of this thread and I've been waiting whether someone will come up with a suggestion to take a little different investigation approach instead of, well, guesswork. It's a little slow due to the weekend and vacation season and I'm a bit impatient. :) So my apologies to Joe for hijacking this thread.

First of all, it's interesting to have a look at the initial questions:
>
> - what can cause a difference in the execution plans besides the stats and indexes?
> - in general what can cause a query to use a lot of cpu?
>

I don't want to guess here :) so I would like to ask the author how he came up with those questions?

The problem is very poor query performance. There is some additional information - the query performed well few days ago and still performs well on a cloned database. The first impulse is to jump on it with guesses - what might have changed that could have caused performance drop?

Some guesses were already investigated - indexes didn't change as Joe suggested. *Some* stats (table stats? index stats? system stats? column stats as well?) were manually re-synchronized with the same results. Now it's time to ask Oracle gurus to do some guesswork. Typical scenario.

Fair enough - an impressive list was produced (and every point can be elaborated much further) and I'm sure there might be quite a few less obvious but still valid guess that some people might be able to contribute.

Hey, what about another hint - increased CPU consumption? OK... Let's start guessing on this one:

- inappropriate hash join
- sorts
- inappropriate nested loop
- out of control sub-query filtering
- latch contention (maybe workload pattern changed a lot?)

The list can go on an on.

Is that an effective approach to try and verify all our guesses even if they seem to be educated guesses?

Let's step back for a moment... What is the problem we are trying to solve? The query is running extremely slow. OK. We start from there - run it through 10046 and 10053 traces. Get execution plan. Create session profile. Now we can see where the time is spent. Set STATISTICS_LEVEL=ALL and investigate V$SQL_PLAN_STATISTICS - see on which step the time is spent.

We shouldn't ignore additional information (performance changed in the last 2 days) and helpful availability of an older database clone. However, instead of using it to validate our guesses - why not to use it for the benefit of troubleshooting process itself?

- Compare execution plans (author mentioned they are different).
- If execution plan hasn't changed - compare session profile and SQL profile.
- If execution plan changed - compare 10053 trace - simple diff would

be a great start.

Over the last years, thanks to few very smart and dedicated gentlemen, we have gone through revolutionary change in the approach to Oracle performance tuning. The revolution that allowed to avoid great deal of guessing and transformed semi-magical/gut-feeling/intuition-based performance tuning ritual that only best of the best could really master into a well-defined methodological scientific approach that everyone of us, mortals, could follow. Performance analysts could stop relying on intuition and educated (or not) guesses.

Why on earth do we still rely on guesswork in other troubleshooting tasks especially when more deterministic paths are available?

Think about it and recall how many times you found yourself in the same situation - jumping from one guess to another trying to find a shortcut without success. If you recall at least once - you probably would be interested in joining the BAAG Party -- Battle Against Any Guess. If you don't reckon it and managed to avoid guesswork in your troubleshooting process - join as well and spread the world how you managed that.

Some of you already know what the BAAG party is and have already become a member. Some of you already started to contribute. For those of you who doesn't know yet - http://www.BattleAgainstAnyGuess.com. Join us and don't forget to subscribe to the RSS feed.

On 6/29/07, Cary Millsap <cary.millsap_at_hotsos.com> wrote:
> Some more for the list:
>
> 5) Change in schema (create/drop indexes, histograms,
> partition/cluster/parallelism declarations, etc.)
> 6) Stored outline manipulation
> 7) Oracle CBO code change (e.g., upgrade)
>
> Cary Millsap
> Hotsos Enterprises, Ltd.
>
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of Jeremy Paul Schneider
>
> four things can change an exec plan:
> 1) change in text of sql
> 2) change in init params
> 3) change in object stats (tables and indexes)
> 4) change in system stats
>
> i'm pretty sure that if those four things don't change then your plan can't
> change. do you have a job collecting system stats?
>
> -Jeremy
>
> On 6/29/07, Joe Armstrong-Champ
> <joseph.armstrong-champ_at_tufts.edu> wrote:
>
> We upgraded to 10.2 a month ago and everything was running fine until 2
> days ago when a query started running very slow in prod. It is selecting
> data from a view using distinct and does an 'order by', too. It runs ok
> in a copy of the db which was refreshed just after the upgrade. The
> difference in the number of rows between the 2 databases isn't that much
> but the execution plans are very different. The indexes are the same in
> both. I manually updated the stats for the affected tables in another
> copy which was refreshed yesterday but the plan didn't change. There is
> a high percentage of cpu involved in the long running query.
>
> Questions:
> - what can cause a difference in the execution plans besides the stats
> and indexes?
> - in general what can cause a query to use a lot of cpu?
>
> Thanks.
> Joe

--
Alex Gorbachev, Oracle DBA Brewer, The Pythian Group
http://www.pythian.com/blogs/author/alex http://blog.oracloid.com
BAAG Party - http://www.BattleAgainstAnyGuess.com
--
http://www.freelists.org/webpage/oracle-l

Received on Sun Jul 01 2007 - 02:40:29 CDT