Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> RE: 64 node Oracle RAC Cluster (The reality of...)

RE: 64 node Oracle RAC Cluster (The reality of...)

From: Mark W. Farnham <>
Date: Wed, 22 Jun 2005 15:51:38 -0400
Message-ID: <>

Couple of reasons for starters:

  1. Just because a CFS is supported doesn't mean it is the most reliable service of an OS. If a given vintage of ASM or straight shared raw has fewer
    "moving parts" (shall we say less code path?) than a given CFS, why expose
    yourself to the increased chances of a SPOF? Actually that's a "Where on the slippery slope between ease of admin and maximum availability do you choose to be?" question. There is room for an honest argument (in lieu of sufficient differential measurements to trust) whether outages due to CFS failures would be more or less than outages due to the logical complexity of having multiple copies of the same ORACLE_HOME. Presumably the answer will vary with OS, CFS, quality of shared disk infrastructure, quality of internode communications, number of nodes, and acumen of the persons involved. (Okay, as to the persons it may have more to do with compulsive attention to detail than it has to do with acumen.)
  2. Now just let's suppose you have several nodes in a grid/rac. Now of course first you're going to test the new release/patch on your isolated wee little 2-3 node test "grid" and make sure it doesn't do bad things. Then you take two or three of the headroom nodes (capacity above need at peak load nodes) off line (ie. you stop their instances of your production database nicely and politely). Now, being sure to have in place on these offline nodes the files that your reboot routines check to prevent unwanted instance restart on reboot, you apply the patch/upgrade to these nodes along with any database changes to the wee little test database you have spanning your production grid (of course the test database instances on the online nodes of the grid are down and locked off in the same manner the production instances are locked off on the nodes you are upgrading. So now you have a few nodes of your test database on the production environment up and running and you make sure it works, running the regression suite you've developed to avoid severe load on the production shared disk farm but fully testing the functionality that must work to avoid losing your company enough money to land your butt in the unemployment line. (I hope you have no actuals on that so you'll have to make an old fashioned guess. You have CJLAD (Compulsive Job Loss Avoidance Disorder) or you really didn't want to stay there anyway if the answer is less than a few thousand bucks.)If you're going to this much trouble there are probably a few more digits involved, unless we're betting on donuts. After your test "in situ" works out okay, then you prepare all the nodes for the new ORACLE_HOME and schedule your clean bounce, third plex split or other quick backup method, pause application of standby logs, and you pull the trigger where any required database upgrade component of the release/patch takes place. After your "Deer Hunter" moment you relax, remove and throw away the cork from a nice bottle of Scotch, and do what any honest fellow (person, for the PC, but "any honest fellow" is from a rare poem that I know) does with a bottle of Scotch that can't be corked. Black Adder in Pete's case, if I recall correctly; I'll take Dalwhinnie; some may stray toward Cabo Wabo, but that's not even Scotch....)



-----Original Message-----
[]On Behalf Of Pete Sharman Sent: Tuesday, June 21, 2005 8:43 PM
Cc: Peter Ross Sharman
Subject: RE: 64 node Oracle RAC Cluster (The reality of...)


Seriously, where a CFS is supported by the OS, why would you do anything else for the ORACLE_HOME?


"Controlling developers is like herding cats."
Kevin Loney, Oracle DBA Handbook

"Oh no, it's not. It's much harder than that!"
Bruce Pihlamae, long-term Oracle DBA

You haven't seen my cat....

Received on Wed Jun 22 2005 - 15:58:33 CDT

Original text of this message