Re: Monitoring the "Checkpoint not complete" event
Date: Mon, 26 Oct 2009 21:16:57 +0700
"Checkpoint not complete" message in the alert log The database attempts to reuse an online redo log file and it can not. This will happen when DBWR has not yet finished checkpointing, because data 's protected by the redo log or ARCH not finished to copy redo log file to archived destination.
How often do you see this error in alert log? Or just when you "alter system switch logfile"
If you often see error in alert log, so check another waiting... And if see session waiting "log file switch", "log buffer space", "log file switch checkpoint or archival required"... that mean DBGW and ARCH need tuning.
- Make DBGW faster: tune DBWR by enable ASYNC I/O, using DBGW I/O slaves (dbwr_io_slaves) or using multiple processes(db_writer_processes).
- Add more redo log file.
- Re-create the log files with a larger size.
- Cause checkpointing to happen more frequently; when use a smaller block buffer cache or settings such as FAST_START_MTTR_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT. This will force DBGW flush dirty blocks more frequently.
On Mon, Oct 26, 2009 at 6:29 PM, Radoulov, Dimitre <cichomitiko_at_gmail.com>wrote:
> Hi all,
> I'm trying to figure out how to implement an automated monitoring regarding
> the above mentioned "event".
> When it happens the instance hang may become a problem and *I believe* that
> monitoring the single occurrence
> of the "Checkpoint not complete" message in the alert log is not sufficient
> (the time between that message
> and the following thread advance is quite important as well).
> So what's the logic/how exactly you monitor the "Checkpoint not complete"