Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: 10gR1 - Checkpoint not complete -
Joel,
Here is what v$sgainfo looks like.
NAME | BYTES --------------------------------|---------------- Fixed SGA Size | 1,309,712 Redo Buffers | 524,288 Buffer Cache Size | 2,147,483,648 Shared Pool Size | 1,073,741,824 Large Pool Size | 218,103,808 Java Pool Size | 33,554,432 Streams Pool Size | 0 Granule Size | 16,777,216 Maximum SGA Size | 6,442,450,944 Startup overhead in Shared Pool | 100,663,296 Free SGA Memory Available | 2,952,790,016
Our application does a lot of very quirky things that are painful for a
DBA, but
until we redesign our new application next year, I just have to find
ways to
deal with the thrashing of the database.
For example, our customers at any point during the day can update in batches of 500,000 any column in their tables. While at the same time users can go only and update their own information... so the batch updates blocks these single record updates etc ... customers have their own tables ..causing lots of similar sql ... this is the reason for a 2GB buffer cache.
So anytime the top 5 wait events change .. we can be really slow, then all of the sudden 3-5 customers start batch updating columns with anywhere from 15MM to 40MM records per table. .. this is in part of the reason we generate so much redo.
Our top five events this morning were these.
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) DB TimeWait Class
------------------------------ ------------ ----------- --------- -------------- db file sequential read 4,598,287 28,970 35.84 User I/O CPU time 20,952 25.92 buffer busy waits 16,131 15,356 19.00 Concurrency db file parallel write 34,904 13,730 16.99 System I/O db file scattered read 292,277 6,568 8.13 User I/O -------------------------------------------------------------
'buffer busy waits' was because of a batch update running in parallel... probably updating records in same/similar blocks.
I understand that the LOG_CHECKPOINT parameters changed and although the recommendation is to use FAST_START_MTTR_TARGET, when I upgraded from 8.1.7 to 10.1.0.4 .. I had other battles to fight.
-peter
joel garry wrote:
> p.santos000_at_gmail.com wrote:
> > Hey guys,
> > I thought I try to get some help in trying to get this problem
> > resolved.
> > I've struggled with the 'Checkpoint not complete' error message
> > before, and thought I had it
> > resolved, but I can't seem to make it go way.
> >
> > I'm currently running 10.1.0.4 on Solaris 64bit.
> > From my analysis of this issue, I get the "Checkpoint not complete"
> > message nearly evertime
> > during a log switch. From looking at the checkpoint error timestamps
> > and the log switches,
> > 99% of the time the CKPT process fails during the log swith.
> >
> > I've always been reluctant to checkpoint too much, but maybe I need
> > to increase checkpoints,
> > sot that the entire process is faster during log switches ?????
> >
> > Here is the data. (log siwtches are occurring around 20+ minutes)
> >
> > V$LOG
> > =================
> >
> > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> > |LASTED_MIN
> >
> > ----------|----------|----------|---|---------------|--------------|----------
> > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> > 21
> > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> > 26
> > 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> > 6
> > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> > 27
> > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> > 26
> > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> > 40
> > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> > 29
> > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> > 10
> >
> >
> > ALERT_LOG (timestamps match log swith)
> > =================
> > --- These all happen during log switch...
> >
> > Fri Oct 27 08:05:31 2006
> > Thread 1 cannot allocate new log, sequence 32195
> > Checkpoint not complete
> > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
> >
> >
> > Fri Oct 27 08:45:18 2006
> > Thread 1 cannot allocate new log, sequence 32196
> > Checkpoint not complete
> > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
> >
> >
> > Fri Oct 27 09:24:21 2006
> > Thread 1 cannot allocate new log, sequence 32198
> > Checkpoint not complete
> > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
> >
> >
> > Fri Oct 27 09:44:59 2006
> > Thread 1 cannot allocate new log, sequence 32199
> > Checkpoint not complete
> > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
> >
> >
> > Fri Oct 27 10:11:13 2006
> > Thread 1 cannot allocate new log, sequence 32200
> > Checkpoint not complete
> > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
> >
> >
> > KEY PARAMETERS
> > =====================
> >
> > NAME |TYPE |VALUE
> > -------------------------------------------|----------------|--------
> > log_checkpoint_interval |integer |80000
> > log_checkpoint_timeout |integer |1800
> > fast_start_mttr_target |integer |0
> > log_buffer |integer |524288
> > disk_asynch_io |boolean |TRUE
> >
> > ** Haven't configured fast_start_mttr_target because log_checK*
> > parameters
> > were already set when I upgrade from 8.1.7 and so It was just one less
> > thing to
> > worry about configuring...
>
>
>
>> >
> >
> > AWR report from 9-10 AM this morning
> > ========================
> > Statistic |Total
> > ----------------------------------------------------------- | ......
> > DBWR checkpoints | 4
> > background checkpoints completed | 3
> > background checkpoints started | 2
> >
> > - the way I see it I'm probably not checkpointing enough .. which
> > causes the ckpt
> > process during log switches to take too long .. is this accurate ?
> >
> > BTW - fixing application to generate less redo, has been explored and
> > we can't do any more on that front .... We generate anywhere between
> > 200K - 1MB of redo per second ... averaging out around 600-700K per
> > second.
>
>
>