I have no experience at all with Tru64 and no production experience
with 9i RAC yet, but lots of experience with 7.3, 8.0.x, and 8.1.x OPS
on Unix.
You are correct that if an instance fails, others will continue to
process - after a very short brownout to perform lock remastering,
shared resource reallocation, and instance recovery on behalf of the
failed instance.
If you use 8i's Net8 or 9i's Net Services to connect, you can set it
up to so that:
1) New connections will automatically be directed to a surviving
instance
2) Existing sessions against a failed instance will automatically
failover to a surviving instance
3) In-flight queries against a failed instance will automatically
resume against a surviving instance.
See the docs on TAF and Net Services at technet.oracle.com Also, see
MetaLink Doc ID: 97926.1 on the limitations of TAF! I am not sure all
of it still applies in 9i, but it should be "required reading" for 8i
TAF - at least. (i.e. Don't swallow that marketing kool-aid without
reading the ingredients! "Transparent application failover" isn't as
"transparent" as the propaganda might lead you to believe.)
You do not need to do anything special to recover a crashed instance.
LMON will signal SMON on a survivor to do it automatically - and
rather quickly. There are things you might want to consider to bound
recovery time though...
What are the (potential) pitfalls with RAC? Let me count the ways...
Hmmm... N*10**M (where N,M >> 1)
- Isn't it kind of new? I wouldn't feel comfortable putting anything
very critical into production on RAC just yet and I've been doing
heavy-duty 7,8, & 8i OPS for years. (Or, as in the old Life cereal
commercial, "Let Mikey try it!")
- System & application testing
Whatever time you think you will need for testing - quadruple it.
Examine your test plan with an electron microscope to make sure you
exercise every molecule of the application and the system - and every
possible combination of those molecules. Especially if you, your
shop, and/or your application have no OPS experience.
- Multiple redo threads & database creation
Not exactly a pitfall, but necessary and not always well-understood up
front. For example, redo group numbers must be unique across all
instances. Typically, the general outline for installation and
creation is to install the software, mount an instance in exclusive
mode, create the database, enable other threads, create redo groups
for other threads, shutdown, change init.ora to enable RAC, start
instances. By the way, the cluster install, for 8i on Solaris at
least, never did work quite correctly. Every time, with every 8i
version, there were some hoops that had to be jumped through afterward
to finish the job. (The phrase "Java-infested peice of junk" springs
ot mind.)
- ifile=
Put all the stuff common to all instances in this and share or copy it
everywhere so you don't have N copies of it to maintain.
- Deadlock resolution
(Example, from 8i, but also applies to 9i.) Deadlocks may be global
in a parallel environment, so they are not handled by the usual
exclusive-Oracle process, but rather by an OPS/RAC-specific process -
LMD - and it may have a different timeout! I have seen a case where
an application experienced a few deadlocks under exclusive Oracle,
but nothing severe enough to cause a noticeable system-wide
performance problem. When ported to 8.1.7.1 OPS, deadlocks would
bring the system to its knees! Shutting down the second instance,
setting PARALLEL_SERVER=FALSE, and bouncing the primary instance cured
it. The difference? A one minute deadlock timeout in exclusive
Oracle versus a ten minute deadlock timeout in 8.1.7.1 OPS!
- PCM lock configuration
Assume the defaults. Only override them with very compelling reason.
Why? GC_FILES_TO_LOCKS and perhaps some other overrides (I'm not
intimately familiar wth RAC yet) will disable cache fusion entirely -
system wide, not just for the stuff in the files explicitly mentioned!
- DBA toolbox
Are all your 3rd-party tools OPS/RAC-aware? Most aren't. Some won't
even work correctly in a parallel environment! Have you looked
through GV$<everything> to get aquainted? Do you have all the
home-grown tuning, monitoring, & reporting scripts you might need for
RAC?
- Backup
You typically want to run backups from one node - datafiles & archive
logs. So, the archive log destinations for all nodes should be
accessible (for read) from (at least) the node where backups are done.
Typically via NFS mount, but Tru64 has shared filesystems.
- Restore
Assume you have two nodes and two instances - OPS1 runs on nodeA and
OPS2 runs on nodeB. If nodeA dies a horrible death and corrupts
datafiles on its way down, you will need to perform a restore from
nodeB. Make sure that your media management layer is able to restore
files via any node/instance. For example, with Veritas NetBackup,
this is a bp.conf modification. (Don't know anything about Tru64
specifics.)
- Recovery
You have managed to restore the corrupted datafile(s) via nodeB. Now
you need to do recovery - and you need the archive logs from ALL
instances to do it. If the archive dest for OPS1 is on disk local to
nodeA, you are due for a hardware shuffle (if even possible). If the
archive destinations for all instances are also on shared storage (not
absolutely required), you can (typically) import the volume group to
nodeB (with a cluster filesystem on Tru64, its a "no-brainer) and do
the recovery.
Generic moral of the backup/recovery stuff: put ALL archive
destinations on shared storage - and on shared filesystems if
possible. Also, test the bejeebers out of backup/restore/recovery.
(Very likely, there WILL be initial bejeebers!)
Only (N*10**M - 10) more potential pitfalls to consider...
(Want to hire a consultant? I'm available! ;-)
-Don Granaman
[OPS OraSaurus]
- Original Message -----
To: "Multiple recipients of list ORACLE-L" <ORACLE-L_at_fatcity.com>
Sent: Thursday, November 08, 2001 8:55 AM
Hi All,
I might be looking at an upgrade of an oracle 8i database to oracle 9i
using RAC.
Has anybody got any experience with this already??
What are the pitfalls with RAC (I don't even have experience with
parallel
server yet).
I know that the database upgrade in itself is not that hard but I have
to
also install RAC................
Are we correct in assuming that if one instance fails with RAC the
others
keep on processing the connections (including the new ones)?
Do we need to do anything special to recover the crashed instance ?
Any comment is greatly appreciated
TIA
Jack
--
Please see the official ORACLE-L FAQ: http://www.orafaq.com
--
Author: Don Granaman
INET: granaman_at_home.com
Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
San Diego, California -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from). You may
also send the HELP command for other information (like subscribing).
Received on Sat Nov 10 2001 - 05:39:00 CST