Don Granaman
Mon, 17 Sep 2001
That is going to be really tough - especially for 7.3.4. Absolutely "no downtime" for anything - including Oracle, OS, application, etc. patches and upgrades isn't feasible unless you simply prohibit them.

There are a lot of good white papers out there that talk about basic concepts and the pros/cons of specific technological solutions. One such recently revised "primer" is:

The closest you can come to 24 x forever is with parallel server, but even that isn't 100% - and it will require VERY serious planning - for everyone involved - DBA, SA, app dev, etc. With 8i, you can come a lot closer. TAF with 8i OPS helps a lot.

I'm most familiar with Sun, Veritos, EMC, and Oracle. I'm not familiar with IBM HACMP. Isn't it essentially a "takeover" solution - similar to, for example, Veritos HA cluster? Any "HA" solution will have a little more downtime in the event of a failure than a well-designed OPS solution - you have to remount volumes, start a new instance, etc. In addition, client failover to the HA "survivor" generally requires more explicit action by the clients.

There are two core technical considerations in Oracle availability - instance failure and database failure. (Disaster recovery scenarios might be considered a special case of both - with additional considerations like complete site failure). OPS and HA clusters address instance failure, but not database failure or site failure. Standby databases, geomirroring, backups, etc. address database failure, but are poor true "high availability" solutions. There is no realistic possibility of obtaining 100% uptime with only a standby database. Advanced replication is a bit of a special case - it may address both modes of failure, but has major application design and performance constraints. In reality, any expectation of 100% uptime for all possible failure scenarios is simply unrealistic - for any technology or hybrid of technologies. The best possible solution will require hybrids of technologies. (e.g. Perhaps an 8i/9i OPS cluster, TAF, standby database, EMC SRDF to synchronously mirror redo log files and archive log dest, DR site, and all the other required supporting infrastructure and procedures.)

The often mentioned "five nines" (99.999% availability) is MUCH more often touted and claimed than actually attained. (Real life example: One claimed "high availability" solution at a large insurance company consisted of only cold hardware, a powered off spare server sitting next to the primary - with no cluster or automation software at all! The intent was simply to plug the array into the spare if the primary failed. Availability - yes. "High"? No!) True "high availability" is neither easy nor cheap. The toughest part of the job is usually in getting realistic requirements and building realistic expectations. The most critical questions are: "How much data can you afford to lose?", "How long can you afford to be down?", and "How much can you afford to spend?". The first two (at least) need to be considered in each of many possible failure scenarios.

-Don Granaman
> We currently have 15+ databases (7.3.4 & 8i using IBM AIX and HACMP) that do
> not have a 24x7 restriction. Now, management is looking to bring in new
> products that will need to be 24X7. They are looking for costs to determine
> the viability of such a decision. I have no 24x7 experience and am looking
> ideas or options to consider. [At least initially they are stating there can
> be no downtime for maintenance (upgrades/reorgs)]

Mon Sep 17 2001

