Re: Measure database availability beyond 99.9%
Date: Fri, 29 Aug 2008 13:27:14 -0600
Options - You can use the database to monitor itself (or another database), but that is not going to provide 100% accuracy. What happens in the case of instance failure? Will any shutdown triggers fire? If you read the alert log, you may not find an instance terminated entry, so you have to guess when it went down. If you are running a health check script, what happens if there is a failure of the script that is untrapped? Or you can use host (unix, windows) tools/scripts. But these may only tell you if the SMON process is running or a privileged user can log in. They may not be able to tell if the database is up, but the network is experiencing problems, so the database is 'down' as far as the application is concerned.
Opinion - 99.9% tracking would reveal any cumulative downtime in excess of 9 hours in a year. Why would this not be sufficient precision? If you are wanting it down to the second, then you are talking a 99.999999% precision (annually). If an outage were recorded in minutes, you can publish a 99.999% figure with a variance of +-.1% (or something to that effect).
Bottom line - Seconds precision would be difficult to monitor and provide no real meaning. Of course...this is purely a technical perspective and there is no doubt someone in management/marketing who wants to brag about a 99.99999999999999% uptime or include it in some contract with no real clue as to what that really means or entails.
-- Daniel Fink Help me support The Children's Hospital of Denver! I'm riding in the 2008 Courage Classic - 157 miles in 3 days Help me reach my goal of $2,500.00 in donations. Visit my Personal Rider Page http://www.couragetours.com/2008/danielwfink to donate OptimalDBA.com - Oracle Performance, Diagnosis, Data Recovery and Training OptimalDBA http://www.optimaldba.com Oracle Blog http://optimaldba.blogspot.com Lost Data? http://www.ora600.be/ Niall Litchfield wrote:Received on Fri Aug 29 2008 - 14:27:14 CDT
> Aaaarrrrgh! I'm sure there's a purpose that isn't lying to justify
> expensive investments. I just cannot see it. Real HA must do service
> level monitoring (aka can the users work) what you seem to propose
> has no clear benefit, please tell me I'm wrong.
> On 28/08/2008, Ingrid Voigt <GiantPanda_at_gmx.net> wrote:
>> we are looking for a tool to measure and report the availability of our
>> databases in the HA range, i.e. with high precision. At this time we are
>> only interested in the database state, not whether the customers can work.
>> The database versions involved are 9.2 - 10.2, 11 coming next year. All
>> editions: SE1, SE and EE.
>> So far, we have been using EM Grid Control, but beyond 99,9% this is not
>> precise enough. Too many failures of the agent/the Grid Control system
>> rather than the database and too much time between "database back up"
>> and "agent notices that database is back up". A switch in the failsafe
>> clusters takes less than a minute and should be reported to the second,
>> if possible.
>> We can get startup time easily from a database trigger or the alertlog,
>> but have not good way to measure shutdown time so far. Is there
>> something good available (free would be nice) or do we have to build it
>> Thanks for your help.
>> Ingrid Voigt