Re: CHECKPOINTS

From: Steve Adams <steve.adams_at_ixora.com.au>
Date: 2000/03/02
Message-ID: <38be420a.61172551@news.eagles.bbs.net.au>

Hi Craig,

I disagree with your statement, "The larger the log files ... the higher the chance there are more dirty blocks in the block buffer." The number of dirty blocks depends on how big the cache is, what the application is doing, and whether DBWR is keeping up. A normal situation at 7.3.4 under a high block change rate but with an effective DBWR would be to have the LRU end of the cache almost entirely clean (the number of buffers cleaned is about four times the internal write batch size) a few buffers on the dirty list (but less than _db_large_dirty_queue) and about 40% of the remaining buffers dirty but still higher up on the LRU list. If you make the log files bigger, it will have no impact on the number of dirty blocks in the cache (unless the logs were previously so small that the cache had not yet reached a stable state since the previous log switch).

Yes, checkpoint waits can often be avoided by increasing the size of the log files, assuming checkpoints are only happening at log switches, but this is because you have longer before the next log switch, rather than because it somehow affects checkpoint duration. I'm all for big log files, but to fix the problem thus is merely to apply a large band aid. The underlying problem of checkpoint duration indicates that DBWR is slow and if that problem is not fixed it will show up as 'free buffer waits' and 'write complete waits' if it is not already doing so. For this reason, the first thing to target here is the DBWR performance, not the log file size. As to the other issues you suggest, I don't care what the rate of redo generation is, the Oracle architecture requires DBWR to be able to cope with it; and the I/O bottleneck is just a form of DBWR problem.

Regards,
Steve Adams

http://www.ixora.com.au/
http://www.oreilly.com/catalog/orinternals/
http://www.christianity.net.au/

On Wed, 01 Mar 2000 21:19:05 -0800, Craig Shallahamer <craig_at_orapub.com> wrote:

>
>--------------ABAAA3DF8ADC082B8866669A
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit
>
>Steve, Bad hair day huh...
>
>Regarding my analysis: The checkpoint duration actually does have a lot to
>do with the size of the log files. The larger the log files (which with a
>constant rate of redo generation the logs will try to be switched less
>frequently) the higher the chance there are more dirty blocks in the block
>buffer. Which means the checkpoint can take longer to complete because the
>DBWR has more work to do.
>
>Check point waits do no necessarily mean a DBWR problem. There could be redo
>log sizing issues (the most common problem/solution), application issues
>(e.g., delete tons of rows instead of using the truncate command), and also
>I/O bottleneck issues on either the .dbf or .log I/O subsystems.
>
>So actually we are both right; checkpoint duration depends on how many dirty
>blocks need to be written which is directly related to both the rate of redo
>generation AND redo log size.
>
>[snip]
>
>Best regards,
>
>Craig.
>
>
>Steve Adams wrote:
>
>> Hello Craig,
>>
>> Your analysis: Checkpoint duration has nothing to do with the size or
>> number of the log files, or the rate of redo generation. The amount of
>> work to be done is a function of the number of dirty blocks in cache at
>> the checkpoint SCN and virtually nothing else.
>>
>> Your commercial: That paper has a number of serious errors. Perhaps the
>> most significant is that you claim that the unit of time is milliseconds
>> when in fact is the centiseconds.
>>
>> Regards,
>> Steve Adams
>> http://www.ixora.com.au/
>> http://www.oreilly.com/catalog/orinternals/
>> http://www.christianity.net.au/
>>
>> On Wed, 01 Mar 2000 15:56:11 -0800, Craig Shallahamer <craig_at_orapub.com>
>> wrote:
>>
>> >[snip]
>> >This "hanging" can be a performance killer, but there are a number of
>> >ways to attack this problem. You can tell if you are having this
>> >problem by looking at the "alert.log" file and looking for "cannot
>> >allocate log" or something like that. You can also check the session
>> >wait tables for log sync and related waits. Besides reducing the redo
>> >the application generates, you add additional log files and increase
>> >their size. Basically anything to "slow the clock down." The situation
>> >can get much more complicated than this, but this is a good start.
>> >
>> >The commercial: There are more details about the session wait stuff in a
>> >paper, Quick Contention Identification Using Oracle's Session Wait Views
>> >on my web site. I also have an Internet Video Seminar on this topic as
>> >well. My performance classes also cover this in detail. And, I have
>> >some tools, for free on my web-site, that collect and report on session
>> >wait details.
>> >
>> >Hope this helps,
>> >
>> >Craig.
>> >
>> >Craig A. Shallahamer
>> >President, OraPub, Inc.
>> >
>> >craig_at_orapub.com
>> >http://www.orapub.com
>> >+1.503.636.0228 (tel)
>> >+1.413.812.4406 (fax)
>> >
>> >[snip]
>> >
>> >Mario wrote:
>> >
>> >> We have a problem with redo log switching
>> >> when running large jobs. All groups fill up
>> >> and remain active even though they
>> >> are archived - presumably this is because
>> >> checkpointing is going on.
>> >>
>> >> Finally when the
>> >> last log group fills and wants to switch back to
>> >> the first group it cannot do so, because
>> >> the first group is still active. At this point the system
>> >> slows to a crawl and even trivial queries require minutes
>> >> for a response.
>> >>
>> >> We have
>> >> increased the number of redo log groups
>> >> (almost doubling the space available),
>> >> and have also increased the initialization
>> >> parameter db_block_checkpoint_batch
>> >> so as to allow for faster checkpointing.
>> >> Yet the problem persists.
>> >>
>> >> Does anyone have any pointers for us?
>> >>
>> >> Is there any good documentation on how the checkpoint
>> >> process works and why it takes so long?
>> >>
>> >> Our System
>> >> Oracle V7.3.4, RS/6000, 2CPUs, 1.3 GB Memory, AIX
>> >>
>> >> Thanks in advance,
>> >> Mario
Received on Thu Mar 02 2000 - 00:00:00 CST