Re: Database High Availability

From: Richard Foote <richard.foote_at_bigpond.com>
Date: Thu, 20 Feb 2003 22:41:52 +1000
Message-ID: <XS25a.51729$jM5.130281@newsfeeds.bigpond.com>

"Chuck" <chuckh_at_softhome.net> wrote in message news:Xns93275EEB32DA8chuckhsofthomenet_at_130.133.1.4...
> "Stephen Tam" <kwtam_at_ti.com> wrote in
> news:b2vjbm$c6r$1_at_tilde.itg.ti.com:
>
> > Hi,
> > What are the methods available on database HA ? what is the cost ?
> >
> > For no cost approach .....
> > Can I use a watch dog type of process install on both unix servers to
> > check if
> > the instance is not started in Server A, then I automatically start it
> > in server B ?
>
> What you are describing is already offered by most unix vendors and is
> usually referred to as fail-over clustering. On AIX the product's called
> HACMP and it's a little more involved than just restarting the instance
> on another server. There are a couple of steps that need to take place
> first to allow that to happen. First you need to unmount the file systems
> from the first server and mount them on the second. You also need to move
> the IP address for the Oracle listener to the new server. HACMP (and
> other clustering soluitions) handle this all for you once you define the
> resource groups. Oracle offers a similar product to do the same thing on
> Windows called FailSafe, which by the way is free if you have a DB
> license.
>
> RAC OTOH is a product that sits on top of the vendor's clustering
> software which makes it more expensive (you need both products) and in my
> opinion the small amount of extra availability it buys you over a fail-
> over solution doesn't justify the cost for most applications. You still
> don't get 100% availability with RAC. When one instance crashes the other
> instance needs to perform recovery for the failed instance during which
> time no other processing can take place. In my classroom experience (to
> be honest I haven't used it in the real world) with the product this can
> take several minutes. Fail-over clustering is not much slower and a lot
> less expensive. It must perform the same recovery as RAC, but also has to
> move the other resources associated with the resource group to the new
> server and start the instance. My experience was that the difference
> between the two was only 1 or 2 minutes. The other downside of fail-over
> clustering is that your users lose their connection where with RAC if
> Net8 is configured correctly they don't. They still lose any uncommitted
> transactions that were running on the failed instance, but the connection
> itself is automatically handed off to the remaining instance.
>

Hi Chuck,

An important point you've missed though is with fail-over, one node is sitting around, chilling out, doing zippo while with RAC, both nodes (or as many as you have configured) are earning their keep. Also with fail-over, both nodes need to be spec'ed to "production" requirements while with RAC it's the sum of the nodes that needs to spec'ed sufficiently (with the sum of the nodes minus 1 being able to at cope "well enough" at least temporarily).

Whether that's enough to compensate for the extra expense is another matter ...

Cheers

Richard Received on Thu Feb 20 2003 - 06:41:52 CST