Re: RAC or Large SMP...?

From: <mccmx_at_hotmail.com>
Date: Thu, 9 Oct 2008 18:39:07 -0700 (PDT)
Message-ID: <951152d1-95c5-42eb-8bb4-8d91aa14d34c@s9g2000prg.googlegroups.com>


> I'm also not convinced that the fewer servers are easier to administer
> arguement is as valid these days. This was certainly true in the past,
> but modern package management has become quite sophisticated.
> Managing larger numbers of servers dedicated to the same role isn't that
> much of an overhead anymore. At least we haven't seen a substantial
> increase in administration since moving to RAC. In fact, the added
> fault tolerance has reduced impact and stress on staff when hardware
> failures occur.
>
> Tim

Its exactly this area of RAC (i.e. adminstration) that concerns me. In your experience does the following scenario sound familiar:

"Ah yes, troubleshooting. I’ve seen many clusters that just froze for no apparent
reason in my time. It’s always possible to make the OS or Cluster software dump a
trace/log file when it happens.

The resulting trace/log file from the cluster will normally be the size of Texas, and
only one or two people in the entire vendor organisation can truly understand them,
you will be told.

Then the files (often with sizes measured in GB) are shipped to the vendor and some
months later they will report back that it wasn’t possible to pinpoint the exact reason
for the complete cluster freeze or crash, but that this parameter was probably a bit low
and this parameter was probably a bit high.

That’s what always happens. I have never – really: never – seen a vendor who could
correctly diagnose and explain a hanging cluster or a cluster that kept crashing.
As to Oracle trouble shooting I’m not so worried. Oracle will either have a
performance problem, which is easy to diagnose using the Wait Interface or you’ll
get ora-600 errors that are fairly easy to diagnose, although you’ll need to spend the
required 42 hours logging and maintaining an iTAR or SR or whatever the name is
these days.

In other words: Finding out what’s wrong (if anything) in Oracle is much easier than
finding out what’s wrong with a cluster."

This quote was pulled from http://www.miracleas.dk/WritingsFromMogens/YouProbablyDontNeedRACUSVersion.pdf.

Has the Oracle clusteware and RAC become mature enough so that the above is no longer a common problem..? The company I now work for deployed RAC 9i and went through 6 months of hell exactly like the scenario above, so they have been burned in the past.

There is also the argument that RAC systems will require more scheduled downtime than single instance systems because there are more Oracle homes to patch (CRS, multiple database homes, ASM homes etc).

Personally, I'd love to implement the RAC solution as I think that it is an excellent technology but somehow I think that I may regret it in the long run...... Received on Thu Oct 09 2008 - 20:39:07 CDT

Original text of this message