New challenge - clustering - do I understand this correctly?

From: Ed Stevens <nospam_at_noway.nohow>
Date: Mon, 11 Aug 2003 17:29:12 -0500
Message-ID: <jo4gjvsv17mr0j8jqo2p7a04pbb5c6s7bg@4ax.com>

Digging a bit more on my first clustering / HA assignment. Let me lay this out a bit . . . please bear with a bit length to provide sufficient info.

First things first: Oracle 8.1.7.

Project is laid out to use 2 Sunfire servers "installed in a cluster configuratin w/ Solaris 8 and Veritas Clustering SW". Vertias Clustering SW; Oracle HA. Oracle 8.1.7. There is also a middle-ware app that will be running on these servers.

All of the docs I read on the Sun site were installation guides (Sun Cluster Data Service for Oracle), so I had to "intuit" what it actually *does*. I also did a search of the ng archives and did some reading there.

I have never dealt with clusters, nor have I ever dealt with OPS.

Here's my understanding based on my reading . .

With Sun Cluster Data Service for Oracle, all nodes of the cluster are sharing the same disks -- the same sets of db files. One Oracle instance is running on one node of the cluster. The other nodes of the cluster periodically ping the instance. If they discover the instance has failed, one of the remaining nodes will fire up an instance that is a duplicate of the failed one. This instance will perform crash recovery. Apps are responsible for their own recovery from a lost connect, but when they reconnect they will be connecetd to the new instance on the server that picked up the load. The switch is done by the clustering software (OS) and even SQLNet won't even be aware of it.

With OPS, you have two concurrently instances against the same database. They can both be accessed concurrently for load balancing or whatever. If one fails, SQLNet (if correctly configured) will reconnect everything to the surviving instance.

Do I have this right?

If so, are there pros and cons that might not be immediately obvious. What I see is that the HA option can result in a few minutes of outage during the switch, but is much easier for the DBA to adminster -- it would be handled just like any other single machine DB and all cluster activity and recovery from node failure is handled by the OS -- transparent to Oracle. On the other hand the OPS option is more complicated for the DBA to administer, but can result in near zero down time.

TIA Received on Mon Aug 11 2003 - 17:29:12 CDT