Re: RAC or Large SMP...?

From: DA Morgan <damorgan_at_psoug.org>
Date: Fri, 10 Oct 2008 06:17:57 -0700
Message-ID: <1223644676.144318@bubbleator.drizzle.com>


mccmx_at_hotmail.com wrote:

>> I'm also not convinced that the fewer servers are easier to administer
>> arguement is as valid these days. This was certainly true in the past,
>> but modern package management has become quite sophisticated.
>> Managing larger numbers of servers dedicated to the same role isn't that
>> much of an overhead anymore. At least we haven't seen a substantial
>> increase in administration  since moving to RAC. In fact, the added
>> fault tolerance has reduced impact and stress on staff when hardware
>> failures occur.
>>
>> Tim

>
> Its exactly this area of RAC (i.e. adminstration) that concerns me.
> In your experience does the following scenario sound familiar:
>
>
>
> "Ah yes, troubleshooting. I’ve seen many clusters that just froze for
> no apparent
> reason in my time. It’s always possible to make the OS or Cluster
> software dump a
> trace/log file when it happens.
>
> The resulting trace/log file from the cluster will normally be the
> size of Texas, and
> only one or two people in the entire vendor organisation can truly
> understand them,
> you will be told.
>
> Then the files (often with sizes measured in GB) are shipped to the
> vendor and some
> months later they will report back that it wasn’t possible to pinpoint
> the exact reason
> for the complete cluster freeze or crash, but that this parameter was
> probably a bit low
> and this parameter was probably a bit high.
>
> That’s what always happens. I have never – really: never – seen a
> vendor who could
> correctly diagnose and explain a hanging cluster or a cluster that
> kept crashing.
> As to Oracle trouble shooting I’m not so worried. Oracle will either
> have a
> performance problem, which is easy to diagnose using the Wait
> Interface or you’ll
> get ora-600 errors that are fairly easy to diagnose, although you’ll
> need to spend the
> required 42 hours logging and maintaining an iTAR or SR or whatever
> the name is
> these days.
>
> In other words: Finding out what’s wrong (if anything) in Oracle is
> much easier than
> finding out what’s wrong with a cluster."
>
>
>
> This quote was pulled from http://www.miracleas.dk/WritingsFromMogens/YouProbablyDontNeedRACUSVersion.pdf.
>
> Has the Oracle clusteware and RAC become mature enough so that the
> above is no longer a common problem..? The company I now work for
> deployed RAC 9i and went through 6 months of hell exactly like the
> scenario above, so they have been burned in the past.
>
> There is also the argument that RAC systems will require more
> scheduled downtime than single instance systems because there are more
> Oracle homes to patch (CRS, multiple database homes, ASM homes etc).
>
> Personally, I'd love to implement the RAC solution as I think that it
> is an excellent technology but somehow I think that I may regret it in
> the long run......

The first RAC implementation I thought stable enough for production consideration was 9.2.0.4: Since then it has gotten substantially better.

Mogen's comments on RAC are accurate within their context: That is not the only context there is.

If you are going to go into RAC then make sure you build into your budget monies for a test cluster for software testing and to be used for DBA training and as a DBA sand box and also training for staff. With real hands-on RAC training, unfortunately something Oracle itself does not provide, 10gR2 and 11g RAC are not that much more difficult to manage than stand-alone.

-- 
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Fri Oct 10 2008 - 08:17:57 CDT

Original text of this message