Re: Data Buffer Cache

From: Richard Foote <richard.foote_at_bigpond.com>
Date: Thu, 19 Sep 2002 17:03:53 +1000
Message-ID: <Yuei9.35691$g9.101631@newsfeeds.bigpond.com>

Hi Michael,

Not quite.

Because if you wait long enough before you do the washing up, someone else might have cleaned and rinsed a glass before you wash it up with the rest !!

Let me illustrate with a simple example.

Scenario 1 (The conventional way ?)

You set the fast_start_* parameters to 0 and checkpoint after every log switch. You log switch ever hour.

This means that every hour, a "mega" checkpoint is issued that requires *ALL CURRENTLY DIRTIED BLOCKS TO BE WRITTEN TO DISK*. This causes much stress to poor DBWR(s) and has a measurable effect on performance. This happens at the following times:

9:00am (things slow down for a few minutes)

10:00am (things slow down for a few minutes)
11:00am (things slow down for a few minutes)
12:00pm (things slow down for a few minutes)
1:00pm (things slow down for a few minutes)
2:00pm (things slow down for a few minutes)
3:00pm (things slow down for a few minutes)

4:00pm (things slow down for a few minutes) 5:00pm (things slow down for a few minutes)

At these times, when performance is important, performance can appreciably suffer for several minutes at a time while these checkpoints are generated.

Scenario 2 (The Tricky Dicky Way ;)

For simplicity, let's assume same redo log configuration but with the fast_start_* set very high to enable continuous (or incremental) checkpoints such that the 90% smallest redo log rule is the most aggressive target. Meaning that Oracle will post DBWR once more than 90% of the smallest redo log has been written. This may or may not be the most aggressive target, but for simplicity, let's assume it is.

Assuming it takes approximately 1 hour to fill a redo log, it will then take approximately 57 minutes to fill 90% of a redo log. Therefore nothing happens for the first 57 minutes before incremental checkpointing kicks-in. Oracle from this point posts DBWR(s) with checkpoint requests to ensure the block(s) dirtied 57 minutes ago be written to disk. However, ding ding ding, key point, *these blocks in all probability have already been written to disk*. They have migrated to death row on the LRU list and aged out with DBWR already writing them to disk as part of it's usual activity. This checkpoint request *has generated no additional writes* !! (except to mark the controlfile).

We move on and more redo is generated. DBWR is posted again to check on the blocks dirtied 57 minutes from this point and again, in all probability this work has already been done. Yes occasionally a block is found that needs to be written to disk, so be it. The likelihood of a block needing to be physically checkpointed is related directly to the interval you tune these parameters to. You are in control.

You might be thinking that DBWR having to check continually if a particular block is currently cached might be expensive. But since Oracle8, the buffer cache is ordered via a "checkpoint list" or in order of being dirtied. The overhead therefore is minimal. The required block(s) to be checkpointed are either the first blocks (or the oldest dirtied block) in the queue or they are not (meaning they've already been written to disk). It's that simple.

The disadvantage ? Well, in this particular scenario, instance recoveries will require approximately 90% of a redo log to be recovered. But this may not be an issue or my most overriding issue, performance is. Also, a clean shutdown might potentially require a more significant clean out of the buffer cache. The biggest issue is the controlfile being hit more regularly.

The net positive effect of scenario B is the following:

9:00am (runs like a charm)

10:00am (runs like a charm)
11:00am (runs like a charm)
12:00pm (runs like a charm)
1:00pm (runs like a charm)
2.00pm (runs like a charm)
3:00pm (runs like a charm)

4:00pm (runs like a charm)
5:00pm (runs like a charm)
...

Also please note this. The *NUMBER OF CHECKPOINT BLOCKS MIGHT REDUCE* !! Yes, DBWR might actually write fewer checkpoint blocks because we are allowing the dirtied blocks sufficient time to age out naturally.

The net effect could be *overall performance improvement*.

Nothing like a real life story to finish up with. At a site I have been supporting, they're performing redo log switches approximately every 30 minutes. They intentionally set the fast_start_io_target to 0 for fear of performance issues (note that the default is number of blocks in the buffer cache). Since I converted them to incremental checkpoints, the number of checkpoint blocks has *decreased* (with what appears to be comparable loads). This is easily measured in the Statspack reports. Predicable recovery times, no periodic stutters in performance, no noticeable effect in performance *at any time*. I'm still in the process of tweaking for the "sweet spot" where the checkpoint count levels out in order to maximise instance recovery times. Attempting to have the cake and eat it ? This is one environment, other environments might have differing results, there are a lot of ifs and buts with all this. I emphasise, this needs to be carefully set, monitored and tuned according to the requirements of the database. The size of the buffer cache (for better and for worse) is a key contributing factor.

However, suggestions that these parameters are bad for performance, that setting them automatically results in sub-optimal performance is *potentially correct* at best, misleading and fundamentally wrong at worst. If set "aggressively", with (say) a 10 second instance recovery target, then the checkpoint block count escalates and performance most certainly suffers. If set and tuned appropriately and if performance is the overriding consideration, then by setting these parameters you could *improve* performance measurably at certain times and not impact performance measurably at other times. Because this mechanism allows dirtied blocks to be aged naturally. The "mega" checkpoint doesn't, other than by delaying them for as long as possible. But when it comes, performance can appreciably suffer as all current dirtied blocks, even those just changed, must all be written to disk.

Note Oracle in a manner recommends the use of these parameters as they're turned *on* by default (or at least the fast_start_io_target is). If I haven't convinced you, then go back and set them to 0 !!

Cheers

Richard

"Michael J. Moore" <hicamel_x_the_spam_at_attbi.com> wrote in message news:uP1i9.15918$L12.3632_at_sccrnsc02...
> This sounds a little like the argument as to when it is best to wash the
> dishes ...
> 1) Wash them as you go
> 2) Wash them at the end of the day
> I guess if you are just sitting around watching the TV, then you might as
> well wash them as you go.
> However, if you are running the Boston Marathon, then clearly stopping now
> and then to wash
> the dishes is not going to make you lightening fast
> Mike.
>
>
>
Received on Thu Sep 19 2002 - 02:03:53 CDT