Re: Checkpoints extreeeemely slow?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: 2000/04/06
Message-ID: <955008682.3509.1.nnrp-01.9e984b29@news.demon.co.uk>#1/1

The checkpoint process always runs in 8.1.6, but is has very little to do with log file I/O.

To get a better handle on the checkpoint time you can set log_checkpoint_to_alert = true to get the start and end times.

Also, check in v$session_event for any waits due to 'log file%' - includes parallel writes etc., and check for the MAX wait time as well as the average time and time-outs. It may just be that you have an intermittent hardware failure.

It is perfectly normal to see log file syncs - but you don't say how long the waits are. Every commit results in a log file sync, which includes a log file write. If you have lots of busy little processes you are likely to see a couple in mid commit quite frequently. The log file syncs and log file writes should wait for a time that is 'reasonable' I/O time for your hardware though.

You don't say how big the db_block_buffer is - a checkpoint takes time because it is writing all dirty blocks to disc. For a very large buffer, and lots of blocks, and a single writer process the time required could get into this ballpark.

At 100 blocks per second, allow 6,000 blocks per minute, then 15 minutes is only 90,000 blocks. You may simply need to tune your db writer strategy.

--

Jonathan Lewis
Yet another Oracle-related web site:  http://www.jlcomp.demon.co.uk

Greg Stark wrote in message
<874s9guq3f.fsf_at_HSE-Montreal-ppp33181.qc.sympatico.ca>...


>

>I'm not sure I'm interpreting these numbers correctly, but they seem to be

>implying that the cause of our performance problems is checkpoints that are

>taking upwards of 15 minutes. Is that even possible? One question I have is

 if


>8.1.6 has a CKPT process running does that necessarily mean it's using it?

 The


>CHECKPOINT_PROCSES parameter seems to be gone in 8i.

>

>I've been experimenting with the redo logs, at this point i have the least

>variables I can expect to have: two redo logs on two different disks. Each

 is


>200Mb, it takes about 20m to switch. The log_checkpoint_timeout is 0 and

>log_checkpoint_interval is 999999999.

>

>The intent is that checkpoints occur only on log file switches and always

 read


>from the _other_ disk from the current active redo log.

>

>What's actually happening is that the checkpoint seems to go on and on for

>about 15 minutes, nearly long enough to bring everything crashing to a halt

 as


>it wraps around and runs into the active redo log. I believe the checkpoint

 is


>happening because the following query continues to show two values with a

>difference of one for this duration and because when I was dropping and

 adding


>log files I received a 01624 on the not current log file.

>

>SQL> select * from v$sysstat where name like 'background checkpoints%';

>

>STATISTIC# NAME CLASS     VALUE

>---------- ----------------------------------------------------------------

 ---------- ----------


>       129 background checkpoints started      8       253

>       130 background checkpoints completed      8       252

>

>15m seems like an incredibly long time to be doing a checkpoint even for

 200M.


>I was under the impression that checkpoints should take seconds, not tens

 of


>minutes. There is no heavily used data on the disks with the redo logs and

 the


>disk array they're on should be blazingly fast. So I'm very frustrated that

>they seem to be very slow.

>

>And the redo logs do seem to be a bottleneck. When I select from

>v$session_wait I nearly always see at least one session and often two or

 more


>waiting on "log file sync". I'm worried this indicates a serious problem

 with


>the redo logs.

>

>--

>greg

Received on Thu Apr 06 2000 - 00:00:00 CDT