Re: compaq tru64 & upgrade 8i to 9i

From: Don Granaman <granaman_at_home.com>
Date: Sat, 10 Nov 2001 03:39:00 -0800
Message-ID: <F001.003C194F.20011110031518@fatcity.com>

I have no experience at all with Tru64 and no production experience with 9i RAC yet, but lots of experience with 7.3, 8.0.x, and 8.1.x OPS on Unix.

You are correct that if an instance fails, others will continue to process - after a very short brownout to perform lock remastering, shared resource reallocation, and instance recovery on behalf of the failed instance.

If you use 8i's Net8 or 9i's Net Services to connect, you can set it up to so that:
1) New connections will automatically be directed to a surviving instance
2) Existing sessions against a failed instance will automatically failover to a surviving instance
3) In-flight queries against a failed instance will automatically resume against a surviving instance.
See the docs on TAF and Net Services at technet.oracle.com Also, see MetaLink Doc ID: 97926.1 on the limitations of TAF! I am not sure all of it still applies in 9i, but it should be "required reading" for 8i TAF - at least. (i.e. Don't swallow that marketing kool-aid without reading the ingredients! "Transparent application failover" isn't as "transparent" as the propaganda might lead you to believe.)

You do not need to do anything special to recover a crashed instance. LMON will signal SMON on a survivor to do it automatically - and rather quickly. There are things you might want to consider to bound recovery time though...

What are the (potential) pitfalls with RAC? Let me count the ways... Hmmm... N*10**M (where N,M >> 1)

Isn't it kind of new? I wouldn't feel comfortable putting anything very critical into production on RAC just yet and I've been doing heavy-duty 7,8, & 8i OPS for years. (Or, as in the old Life cereal commercial, "Let Mikey try it!")
System & application testing Whatever time you think you will need for testing - quadruple it. Examine your test plan with an electron microscope to make sure you exercise every molecule of the application and the system - and every possible combination of those molecules. Especially if you, your shop, and/or your application have no OPS experience.
Multiple redo threads & database creation Not exactly a pitfall, but necessary and not always well-understood up front. For example, redo group numbers must be unique across all instances. Typically, the general outline for installation and creation is to install the software, mount an instance in exclusive mode, create the database, enable other threads, create redo groups for other threads, shutdown, change init.ora to enable RAC, start instances. By the way, the cluster install, for 8i on Solaris at least, never did work quite correctly. Every time, with every 8i version, there were some hoops that had to be jumped through afterward to finish the job. (The phrase "Java-infested peice of junk" springs ot mind.)
ifile= Put all the stuff common to all instances in this and share or copy it everywhere so you don't have N copies of it to maintain.
Deadlock resolution
(Example, from 8i, but also applies to 9i.) Deadlocks may be global
in a parallel environment, so they are not handled by the usual exclusive-Oracle process, but rather by an OPS/RAC-specific process - LMD - and it may have a different timeout! I have seen a case where an application experienced a few deadlocks under exclusive Oracle, but nothing severe enough to cause a noticeable system-wide performance problem. When ported to 8.1.7.1 OPS, deadlocks would bring the system to its knees! Shutting down the second instance, setting PARALLEL_SERVER=FALSE, and bouncing the primary instance cured it. The difference? A one minute deadlock timeout in exclusive Oracle versus a ten minute deadlock timeout in 8.1.7.1 OPS!
PCM lock configuration Assume the defaults. Only override them with very compelling reason. Why? GC_FILES_TO_LOCKS and perhaps some other overrides (I'm not intimately familiar wth RAC yet) will disable cache fusion entirely - system wide, not just for the stuff in the files explicitly mentioned!
DBA toolbox Are all your 3rd-party tools OPS/RAC-aware? Most aren't. Some won't even work correctly in a parallel environment! Have you looked through GV$<everything> to get aquainted? Do you have all the home-grown tuning, monitoring, & reporting scripts you might need for RAC?
Backup You typically want to run backups from one node - datafiles & archive logs. So, the archive log destinations for all nodes should be accessible (for read) from (at least) the node where backups are done. Typically via NFS mount, but Tru64 has shared filesystems.
Restore Assume you have two nodes and two instances - OPS1 runs on nodeA and OPS2 runs on nodeB. If nodeA dies a horrible death and corrupts datafiles on its way down, you will need to perform a restore from nodeB. Make sure that your media management layer is able to restore files via any node/instance. For example, with Veritas NetBackup, this is a bp.conf modification. (Don't know anything about Tru64 specifics.)
Recovery You have managed to restore the corrupted datafile(s) via nodeB. Now you need to do recovery - and you need the archive logs from ALL instances to do it. If the archive dest for OPS1 is on disk local to nodeA, you are due for a hardware shuffle (if even possible). If the archive destinations for all instances are also on shared storage (not absolutely required), you can (typically) import the volume group to nodeB (with a cluster filesystem on Tru64, its a "no-brainer) and do the recovery.

Generic moral of the backup/recovery stuff: put ALL archive destinations on shared storage - and on shared filesystems if possible. Also, test the bejeebers out of backup/restore/recovery.
(Very likely, there WILL be initial bejeebers!)

Only (N*10**M - 10) more potential pitfalls to consider...
(Want to hire a consultant? I'm available! ;-)

-Don Granaman
[OPS OraSaurus]

Original Message ----- To: "Multiple recipients of list ORACLE-L" <ORACLE-L_at_fatcity.com> Sent: Thursday, November 08, 2001 8:55 AM

Hi All,

I might be looking at an upgrade of an oracle 8i database to oracle 9i using RAC.
Has anybody got any experience with this already??

What are the pitfalls with RAC (I don't even have experience with parallel
server yet).
I know that the database upgrade in itself is not that hard but I have to

also install RAC................

Are we correct in assuming that if one instance fails with RAC the others
keep on processing the connections (including the new ones)? Do we need to do anything special to recover the crashed instance ?

Any comment is greatly appreciated

TIA Jack

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Don Granaman
  INET: granaman_at_home.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L


(or the name of mailing list you want to be removed from).  You may

also send the HELP command for other information (like subscribing).

Received on Sat Nov 10 2001 - 05:39:00 CST