Re: 10gR1 - Checkpoint not complete -

From: joel garry <joel-garry_at_home.com>
Date: 27 Oct 2006 12:06:14 -0700
Message-ID: <1161975974.726809.138680@i3g2000cwc.googlegroups.com>

p.santos000_at_gmail.com wrote:
> Hey guys,
> I thought I try to get some help in trying to get this problem
> resolved.
> I've struggled with the 'Checkpoint not complete' error message
> before, and thought I had it
> resolved, but I can't seem to make it go way.
>
> I'm currently running 10.1.0.4 on Solaris 64bit.
> From my analysis of this issue, I get the "Checkpoint not complete"
> message nearly evertime
> during a log switch. From looking at the checkpoint error timestamps
> and the log switches,
> 99% of the time the CKPT process fails during the log swith.
>
> I've always been reluctant to checkpoint too much, but maybe I need
> to increase checkpoints,
> sot that the entire process is faster during log switches ?????
>
> Here is the data. (log siwtches are occurring around 20+ minutes)
>
> V$LOG
> =================
>
> GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> |LASTED_MIN
>
> ----------|----------|----------|---|---------------|--------------|----------
> 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> 21
> 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> 26
> 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> 6
> 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> 27
> 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> 26
> 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> 40
> 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> 29
> 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> 10
>
>
> ALERT_LOG (timestamps match log swith)
> =================
> --- These all happen during log switch...
>
> Fri Oct 27 08:05:31 2006
> Thread 1 cannot allocate new log, sequence 32195
> Checkpoint not complete
> Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
>
>
> Fri Oct 27 08:45:18 2006
> Thread 1 cannot allocate new log, sequence 32196
> Checkpoint not complete
> Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
>
>
> Fri Oct 27 09:24:21 2006
> Thread 1 cannot allocate new log, sequence 32198
> Checkpoint not complete
> Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
>
>
> Fri Oct 27 09:44:59 2006
> Thread 1 cannot allocate new log, sequence 32199
> Checkpoint not complete
> Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
>
>
> Fri Oct 27 10:11:13 2006
> Thread 1 cannot allocate new log, sequence 32200
> Checkpoint not complete
> Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
>
>
> KEY PARAMETERS
> =====================
>
> NAME |TYPE |VALUE
> -------------------------------------------|----------------|--------
> log_checkpoint_interval |integer |80000
> log_checkpoint_timeout |integer |1800
> fast_start_mttr_target |integer |0
> log_buffer |integer |524288
> disk_asynch_io |boolean |TRUE
>
> ** Haven't configured fast_start_mttr_target because log_checK*
> parameters
> were already set when I upgrade from 8.1.7 and so It was just one less
> thing to
> worry about configuring...

Are you saying you used 817 parameters for 10g? I hope I'm misreading that. See the last sentence in
http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html

In addition to what Mark said, you should post what your top waits are, and show SGA. I expect some issues on the size of your SGA and your log_buffer.

Also you may want to research the use of async i/o on your particular system.

>
> AWR report from 9-10 AM this morning
> ========================
> Statistic |Total
> ----------------------------------------------------------- | ......
> DBWR checkpoints | 4
> background checkpoints completed | 3
> background checkpoints started | 2
>
> - the way I see it I'm probably not checkpointing enough .. which
> causes the ckpt
> process during log switches to take too long .. is this accurate ?
>
> BTW - fixing application to generate less redo, has been explored and
> we can't do any more on that front .... We generate anywhere between
> 200K - 1MB of redo per second ... averaging out around 600-700K per
> second.
>
> as always any feedback is helpful...

You may need more and smaller logs with a bigger log buffer.

45 minutes for a checkpoint not to complete is rather worrisome. Your application code may be doing something that keeps fighting over dirty buffers - do you have something that repeatedly updates a small number of rows?

--
@home.com is bogus.
So which DS should I buy the kids?
http://www.signonsandiego.com/uniontrib/20061027/news_1b27sony.html

Received on Fri Oct 27 2006 - 14:06:14 CDT