Re: Monitoring the "Checkpoint not complete" event

From: Niall Litchfield <niall.litchfield_at_gmail.com>
Date: Tue, 27 Oct 2009 08:32:51 +0000
Message-ID: <7765c8970910270132t3faf17fau70ccadd7e83af83_at_mail.gmail.com>



surachart

Dimitre's point is rather more subtle than that. It isn't the detection of "checkpoint not complete" that is the challenge, but the detection of the "hang" following it. I'd call it a wait not a hang, but still. I don't see how you can detect that from the alert.log until after the event, and certainly not by monitoring checkpoint not complete itself. This might show itself up in the following wait events (my best guess would be the third).

"log file switch (archiving needed)"
"log file switch (checkpoint incomplete)"
"log file switch completion"

regards

Niall (who hasn't ever been bothered enough by checkpoint incomplete to check)

On Mon, Oct 26, 2009 at 10:23 PM, Surachart Opun <surachart_at_gmail.com>wrote:

> To Dimitre,
>
> About script monitor... You need to check "Checkpoint not complete" in
> alert log file
>
> If you use Enterprise Manager, You can set
>
> "Metric and Policy Settings" ->
> at "Generic Alert Log Error" Metric
> modify value to monitor "Checkpoint not complete"
>
> http://download.oracle.com/docs/cd/B19306_01/em.102/b25986/oracle_database.htm
>
> if you don't have EM, you may make alert log error notification like
> http://www.dba-oracle.com/t_alert_log_monitoring_errors.htm
>
> You can check, How often switch log at...
> SQL> alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';
> SQL> select * from v$log_history order by FIRST_TIME;
> -- check first_time between 2 times.
>
> If In normal time, your database often switches logfile... you have to tune
> it.
> - Make DBGW faster: tune DBWR by enable ASYNC I/O, using DBGW I/O slaves
> (dbwr_io_slaves) or using multiple processes(db_writer_processes).
> - Add more redo log file.
> - Re-create the log files with a larger size.
>
>
> Surachart Opun
> http://surachartopun.com
>
>
> On Tue, Oct 27, 2009 at 2:19 AM, Radoulov, Dimitre <
> cichomitiko_at_gmail.com> wrote:
>
>>
>> >>> On Mon, Oct 26, 2009 at 6:29 PM, Radoulov, Dimitre wrote:
>> [...]
>>
>> >>> I'm trying to figure out how to implement an automated monitoring
>> regarding the above mentioned "event".
>> >>> When it happens the instance hang may become a problem and *I believe*
>> that monitoring the single occurrence
>> >>> of the "Checkpoint not complete" message in the alert log is not
>> sufficient (the time between that message
>> >>> and the following thread advance is quite important as well).
>> >>>
>> >>> So what's the logic/how exactly you monitor the "Checkpoint not
>> complete" event?
>> [...]
>>
>> >> On 26/10/2009 15.16, Surachart Opun wrote:
>> >> "Checkpoint not complete" message in the alert log
>> >> The database attempts to reuse an online redo log file and it can not.
>> [...]
>>
>>
>> Hi Surachart Opun,
>> thank you for your answer!
>> I'm aware of the possible solutions. Moreover, I want to trigger a
>> critical alert when an instance hangs
>> because of this event. I'm not sure if only the monitoring of that message
>> is sufficient
>> and I would like to know how you have implemented (if implemented at all)
>> it.
>>
>>
>> Regards
>> Dimitre
>>
>
>

-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 27 2009 - 03:32:51 CDT

Original text of this message