Re: CHECKPOINTS

From: Steve Adams <steve.adams_at_ixora.com.au>
Date: 2000/03/02
Message-ID: <38beaa66.87888496@news.eagles.bbs.net.au>#1/1

Hi Mario,

Thanks for that. If by 50M you mean 50,000K then your log_checkpoint_interval is an exact match to the log file size, because it is measured in 512 byte blocks on AIX.

The other parameter settings seem OK too. Your internal write batch size works out at 400, so the checkpoint batch size is reasonable at 64. However, you only have 3 disks, so these numbers are larger than ideal and introduce a risk of 'write complete waits'. It would be better to set both db_files and db_block_checkpoint_batch to 30, however, that is not the problem we are facing here. There is also a school of thought that multiple db_writers works better than aio. I think it is true, but once again, I don't think that is your problem.

The problem is that you have too few physical disks to support that cache size. You need to either
1) spread the data files over more physical disks (striping works well) to improve the DBWR bandwidth, or
2) reduce db_block_buffers to limit the backlog of dirty buffer, or 3) increase the log file size to give DBWR more time to finish the job.

Presently, your cache is 250M, so lets say that 100M of that is dirty at a log switch. DBWR has to write all that to just 3 disks using a random I/O pattern, more quickly than LGWR can write 50M to one disk using a sequential I/O pattern. Under intensive redo generation, LGWR is going to win that race every time.

As to why your redo generation is so intensive, I cannot say. It depends on what you are doing. There are some tricks to reduce redo generation, but it is something that most people just live with. And, yes, rollback segment changes have to be logged to allow for crash recovery.

Hope this helps,
Regards,
Steve Adams

http://www.ixora.com.au/
http://www.oreilly.com/catalog/orinternals/
http://www.christianity.net.au/

On Thu, 2 Mar 2000 15:26:19 +0100, "Mario" <mtechera_at_yahoo.com> wrote:

>Steve,
>
>The log files are 50 MB each (10 groups, see previous post).
>
>log_checkpoint_interval = 100000
>log_checkpoint_timeout = 0
>
>Let me know if you get any ideas.
>
>Mario
>
>Steve Adams <steve.adams_at_ixora.com.au> wrote in message
>news:38be56e7.66513170_at_news.eagles.bbs.net.au...
>> Hi Mario,
>>
>> I should also have asked for the size of the log files, and the values of
>> log_checkpoint_interval and log_checkpoint_timeout, just in case you are
>> checkpointing too intensively for a checkpoint to finish before the next
one
>> starts.
>>
>> Regards,
>> Steve Adams
>> http://www.ixora.com.au/
>> http://www.oreilly.com/catalog/orinternals/
>> http://www.christianity.net.au/
>>
>>
>> On Thu, 02 Mar 2000 04:05:48 GMT, steve.adams_at_ixora.com.au (Steve Adams)
wrote:
>>
>> >Hi Mario,
>> >
>> >This is a DBWn bottleneck.
>> >Some questions ...
>> >
>> >What version of AIX?
>> >Is aio available (it should be)?
>> > To find out, do the following as root
>> > # smit aio
>> > select "Change/Show Characteristics of Asynchronous I/O"
>> >How many datafiles do you have?
>> >How many physical disks (spindles) underlying the data files (not
>> >counting log files)?
>> >What are the values of the following parameters?
>> > use_async_io
>> > db_writers
>> > db_files
>> > db_file_simultaneous_writes
>> > _db_block_write_batch (if set)
>> > db_block_checkpoint_batch
>> > db_block_buffers
>> > db_block_size
>> >
>> >Regards,
>> >Steve Adams
>> >http://www.ixora.com.au/
>> >http://www.oreilly.com/catalog/orinternals/
>> >http://www.christianity.net.au/
>> >
>> >
>> >On Wed, 1 Mar 2000 23:12:32 +0100, "Mario" <mtechera_at_yahoo.com> wrote:
>> >
>> >>
>> >>We have a problem with redo log switching
>> >>when running large jobs. All groups fill up
>> >>and remain active even though they
>> >>are archived - presumably this is because
>> >>checkpointing is going on.
>> >>
>> >>Finally when the
>> >>last log group fills and wants to switch back to
>> >>the first group it cannot do so, because
>> >>the first group is still active. At this point the system
>> >>slows to a crawl and even trivial queries require minutes
>> >>for a response.
>> >>
>> >>We have
>> >>increased the number of redo log groups
>> >>(almost doubling the space available),
>> >>and have also increased the initialization
>> >>parameter db_block_checkpoint_batch
>> >>so as to allow for faster checkpointing.
>> >>Yet the problem persists.
>> >>
>> >>Does anyone have any pointers for us?
>> >>
>> >>Is there any good documentation on how the checkpoint
>> >>process works and why it takes so long?
>> >>
>> >>Our System
>> >>Oracle V7.3.4, RS/6000, 2CPUs, 1.3 GB Memory, AIX
>> >>
>> >>Thanks in advance,
>> >>Mario
>> >>
>> >>
>> >>
>> >>
>> >
>>
>
>
Received on Thu Mar 02 2000 - 00:00:00 CST