Re: Common Oracle RDBMS Misconceptions

From: Jeremiah Wilton <jwilton_at_speakeasy.net>
Date: Wed, 27 Jun 2001 10:48:26 -0700
Message-ID: <F001.0033A386.20010627090125@fatcity.com>

On Tue, 26 Jun 2001, Stephane Faroult wrote:

> But in practice, why would you switch to the standby database, unless
> the primary database is crashed or worse?

Hardware replace/repair
Move to a larger host
O/S upgrade
File layout revision
Planned/impending infrastructure outage
Database problem in which datafiles are corrupted but redologs are not
Frequent memory faults
Any chronic but not terminal host-related problem
Migration to a new I/O subsystem

> You know how it is in a production environment, the database
> crashes. Even if failover is easy, you always have to instruct users
> to connect as scott/tiger_at_backup instead of scott/tiger_at_prod - or
> perhaps modify the tnsnames.ora to make it transparent, or perhaps
> play with IP addresses which may mean trouble for a while with
> in-memory routing tables etc.

Well, I guess the presumption is that if someone went to the trouble of setting up a standby, they would actually have a way to point people at the standby in the event of a failover. As you point out, there are a number of scriptable solutions that are suitable. The easiest, and one you mention, is IP address assumption. This is the same method that is used by HA cluster solutions like Veritas HA, HP MC Serviceguard and Compaq TruCluster. It is easy to script, manage and execute. Contrary to your belief, no problems with "in-memory routing tables" arise, and the change is immediate. It is a simple matter of 'ifconfig delete' and 'ifconfig alias.' These actions take all necessary steps to notify routers and switches on your subnet that ther MAC address of the IP has changed and that packets should be routed accordingly. Using a dedicated IP address just for the database service is a good idea even if you aren't building a standby or HA solution. It comes in handy if you ever decide you want to rehost the database.

> My point is that, even if the switch can be quasi-immediate, it is
> not so easy, so people will naturally try to make the main machine
> work first, there will be some delay assessing the damage, waiting
> until 2am to ring the VP in his bed to get the authorization to
> switch, etc.

Well, not everyone has to have authorization from a VP to fail over to a standby. The endless troubleshooting is a real problem. At many sites, they set an upper bound on time spent on diagnostics, and require a failover (if a failover is appropriate) after some number of minutes. The various failover scenarios are scripted and packaged in advance. You don't rush around trying to figure it out when the system is failing.

> In real life, half-an-hour or an hour is easily passed before
> everybody is back at work on the backup machine, busy trying to
> catch up on the wasted time. Do not forget that since the
> transmission of redo logs is asynchronous (I have heard about
> improvements with 9i) some transactions - committed ones - will have
> been lost, so users will have to check and probably reenter the
> missing transactions. At this point the main machine will probably
> be totally out of order. Wait another 2 or 4 hours to have somebody
> to come if it's a hardware problem, I guess that when everything is
> over everybody will be on their knees and the last thing they will
> have in mind is make the old primary database the new standby -
> assuming of course that all files are intact. And even if the
> ex-standby machine is possibly less powerful, everybody will
> probably wait until a quieter time, say the W/E, to switch back to
> the initial configuration. At which time, in all likelihood, a full
> database copy will have become necessary; I think that the simple
> fact of having reentered a couple of transactions not transmitted
> yet to the standby database would require it. Do I err ?

Basically, the only time you wouldn't do a graceful failover is in the rare event that you didn't have access to the last few logs the primary had written. In that case, you would be forced to activate the standby database as of the time of the last log you have. This is one of the risks you take with a standby database, and the standby must be presented to others within the company as a redundant solution that may result in the loss of some large number of transactions, depending on how how often the standby pulls logs. There must be a contingency plan in place to handle this eventuality that takes your application and data into account.

Synchronous log update on the standby side is available in 9.0, and available on previous versions using third-party technologies such as EMC SRDF or Veritas SRVM. These products can be employed to mirror the online logs and controlfile, in order to create a no-loss standby. The problem with this configuration is that it makes the primary beholden to network latency for log writes. This can have a significant impact on performance.

I discuss *all* of these considerations at some length in my HA paper on my site.

http://www.speakeasy.org/~jwilton/241.pdf

--
Jeremiah Wilton
http://www.speakeasy.net/~jwilton

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jeremiah Wilton
  INET: jwilton_at_speakeasy.net

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Wed Jun 27 2001 - 12:48:26 CDT

Re: Common Oracle RDBMS Misconceptions - standby db?