Re: Monitoring the "Checkpoint not complete" event

From: Radoulov, Dimitre <cichomitiko_at_gmail.com>
Date: Tue, 27 Oct 2009 09:42:13 +0100
Message-ID: <4AE6B265.60400_at_gmail.com>


Exactly.
Thank you Niall! We had an automatic Sun Cluster fault monitor switch because of this a few days ago.
I don't believe that waking up the DBA at 3 o'clock in the morning is appropriate if the "Checkpoint not complete" message is immediately followed by a log file switch completion. Hence I wanted to know if and how other sites/shops implement such a monitoring.

Regards
Dimitre

On 27/10/2009 9.32, Niall Litchfield wrote:
> surachart
> Dimitre's point is rather more subtle than that. It isn't the
> detection of "checkpoint not complete" that is the challenge, but the
> detection of the "hang" following it. I'd call it a wait not a hang,
> but still. I don't see how you can detect that from the alert.log
> until after the event, and certainly not by monitoring checkpoint not
> complete itself. This might show itself up in the following wait
> events (my best guess would be the third).
> "log file switch (archiving needed)"
> "log file switch (checkpoint incomplete)"
> "log file switch completion"
> regards
> Niall (who hasn't ever been bothered enough by checkpoint incomplete
> to check)
>
> On Mon, Oct 26, 2009 at 10:23 PM, Surachart Opun <surachart_at_gmail.com
> <mailto:surachart_at_gmail.com>> wrote:
>
> To Dimitre,
>
> About script monitor... You need to check "Checkpoint not
> complete" in alert log file
>
> If you use Enterprise Manager, You can set
>
> "Metric and Policy Settings" ->
> at "Generic Alert Log Error" Metric
> modify value to monitor "Checkpoint not complete"
> http://download.oracle.com/docs/cd/B19306_01/em.102/b25986/oracle_database.htm
>
> if you don't have EM, you may make alert log error notification like
> http://www.dba-oracle.com/t_alert_log_monitoring_errors.htm
>
> You can check, How often switch log at...
> SQL> alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';
> SQL> select * from v$log_history order by FIRST_TIME;
> -- check first_time between 2 times.
>
> If In normal time, your database often switches logfile... you
> have to tune it.
> - Make DBGW faster: tune DBWR by enable ASYNC I/O, using DBGW I/O
> slaves (dbwr_io_slaves) or using multiple
> processes(db_writer_processes).
> - Add more redo log file.
> - Re-create the log files with a larger size.
>
>
> Surachart Opun
> http://surachartopun.com <http://surachartopun.com/>
>
>
> On Tue, Oct 27, 2009 at 2:19 AM, Radoulov, Dimitre
> <cichomitiko_at_gmail.com <mailto:cichomitiko_at_gmail.com>> wrote:
>
>
> >>> On Mon, Oct 26, 2009 at 6:29 PM, Radoulov, Dimitre wrote:
> [...]
>
> >>> I'm trying to figure out how to implement an automated
> monitoring regarding the above mentioned "event".
> >>> When it happens the instance hang may become a problem and
> *I believe* that monitoring the single occurrence
> >>> of the "Checkpoint not complete" message in the alert log
> is not sufficient (the time between that message
> >>> and the following thread advance is quite important as well).
> >>>
> >>> So what's the logic/how exactly you monitor the
> "Checkpoint not complete" event?
> [...]
>
> >> On 26/10/2009 15.16, Surachart Opun wrote:
> >> "Checkpoint not complete" message in the alert log
> >> The database attempts to reuse an online redo log file and
> it can not.
> [...]
>
>
> Hi Surachart Opun,
> thank you for your answer!
> I'm aware of the possible solutions. Moreover, I want to
> trigger a critical alert when an instance hangs
> because of this event. I'm not sure if only the monitoring of
> that message is sufficient
> and I would like to know how you have implemented (if
> implemented at all) it.
>
>
> Regards
> Dimitre
>
>
>
>
>
> --
> Niall Litchfield
> Oracle DBA
> http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 27 2009 - 03:42:13 CDT

Original text of this message