Re: Got the darn buffer busy waits under control, at last!

From: Zo$t$ <wizofoz2k_at_yahoo.com.au>
Date: Mon, 17 Jun 2002 20:15:11 +1000
Message-ID: <3d0dbab9$0$28008$afc38c87@news.optusnet.com.au>

Hmmm sorry, I disagree. The log files show clearly there is a marked improvement overall in most buffer related latches. And in quite a few of the stats. Since other minor changes in init.ora in common with this one (notably the reduction of DBFMR) have been done before with nowhere near as strong an impact as this one, I have no other conclusion than that in this instance the spin count helped. Particularly since it resulted in no appreciable increase in overall CPU use.

I'm perfectly aware that it is not a desirable first-tuning parameter. That doesn't mean however we should completely ignore it. It can help and when it does it should not be ignored just because it is politically incorrect to touch it.

I've seen latch free both higher and lower than bbw. In another earlier log with no spin-count whatsoever, this showed:

latch free                                8523         11198
1.31
buffer busy waits                        60760          6327

How do we then define latch free being a "significant wait event"?

As for the relief of the worst bottleneck: I'm familiar with that methodology. It's got some disadvantages as well.

Let me elaborate a little bit. Typical scenario: someone looks at v$sqlarea and pulls out the worst SQL. This is tuned to death. Nothing wrong with that.

Once this bottleneck is removed however, there is a combination of other SQL that causes some other resource to hit a limit. It's the SQL that is wrong but because the other resource has hit the roof, people go and tune that bottleneck. Once that is removed, they go back and find the next bottleneck.

Say it is bad SQL again. That is tuned and suddenly the second bottleneck is back again: there is now more CPU or I/O capacity for the second crook SQL to use up, so it does. And so on.

Ie, bottlenecks can easily be masked behind other causes. It takes nearly instinctive qualities to be able to drop the whole lot down in one iteration. Or successive, monotonous iterations until all major
problems are solved or a performance target is attained.

A valid approach, but tedious and with the potential for much frustration. And it assumes that we have the resources available to closely monitor what is going on all day long. In most cases, that is not what happens. I prefer to concentrate in one aspect and one aspect only (assuming of course there are no other GLARING config errors, such as all datafiles in a single drive and such): the quality of the SQL.

I've found over the years this (and poor application design, but that's harder to fix) to be by far the best approach to getting performance up. SQL that is well tuned and doesn't impose a dramatic load on the system is a pre-condition for any other tuning to be effective.

So that's where I concentrate. Most of the really bad SQL in this particular application has been either weeded out or tuned to the extent where it is not important anymore.

There is one aspect that I can't tune: an "AI" dynamic query tool they use which has a penchant for picking the combinations of SQL that send the 8.0 CBO optimizer completely bananas.

Over time, I've found that it is the use of the "latest" features that usually causes the CBO optimizer to break in these cases. Like the index fast full scans, which were relatively new in 8.0.

And quite a few other bits I've had to disable. Like the hash joins, which insisted on being picked up indiscriminately for every query where there was an imbalance in number of rows in joined tables. Regardless of the consequences of such a hash join or the volume of rows involved.

Unlike so many others, I dispute that hash joins are a universal panacea for join performance. They can be highly effective, but that assumes other pre-requisites have been met. And the Oracle default install does NOT meet those pre-requisites to start with.

Besides, a stupid join is only gonna run much faster (and use much more resources) with hash joins. It won't go away. In most cases that's what I've indeed found: all system resources being hogged by the hash joins trying to make stupid SQL run a lot faster. Wrong approach, IMHO.

Now, I'm perfectly aware that when this thing goes to 8i or 9i, if I leave the init.ora as is I'm going to have problems. It goes without saying that nearly all the extra parameters used here will be commented out in the upgrade to 8i.

And we'll have to undergo testing of the system with only basic tuning applied. It is likely that much SQL will run a lot better and won't need as much tweaking at this level. At least I hope so. Given that we'd be testing the app anyway, this is not a problem.

Under these conditions, I find it hard to believe that any of this tuning will turn around and bite me later. Particularly given that it's been in place for a while and never has. In fact, it's only when we've tried to use some of the more advanced features of both 8.0 and its optimizer that we've been burned badly.

I'm quite sure this will improve with 8i (in fact I'm absolutely sure it will: I've got 8i running in another system and "problem" SQL behaves a lot better there than in vanilla 8.0.

Of course if I end up not doing the upgrade, it is quite possible that some other person may get stuck. IF he/she leaves everything as is and just plonks a new version of the s/w. And expects that will magically solve all problems. It won't, and we both know that. But then again, it will be their problem. They can't claim this is not documented, as it is and ad nauseam.

--
Cheers
Nuno Souto
wizofoz2k_at_yahoo.com.au

"Ricky Sanchez" <rsanchez_at_more.net> wrote in message
news:3D0D381C.792CD86F_at_more.net...


> Nuno-

>

> If latch free was not a significant wait event in the first place,

then


> the spin count exercise is a wasted effort. You are not addressing

the


> real problem, which is the discovery and relief of the worst

bottleneck.


>

> I would suggest you remove the spin count parameter altogether so as

not


> to cloud issues in the future. Yes, you achieved an apparent change

in


> latch behaviour. On the other hand, you cannot really say you

improved


> the instance's performance in any measurable way. Later, when you

tune


> the real issues, the dynamics of the instance may change such that

the


> spin count thing bites you in the ass.

>

> - ricky

Received on Mon Jun 17 2002 - 05:15:11 CDT