RE: SAC NORAD and Dustin Hoffman. WAS: HA HA HA on your 24/7 systems

From: Steve Orr <sorr_at_arzoo.com>
Date: Mon, 16 Oct 2000 09:31:09 -0700
Message-Id: <10651.119337@fatcity.com>

Actually, NORAD was designed to survive a direct hit as was capable during the time it was build. However, with more accurate delivery systems now it is conceivable that a missle could navigate part way through the entrance tunnel so as to make the facility inoperable. Then there are multiple direct hits...

But of course, none of this has been tested and sadly, this is often the case with HA 24X7 systems. You need sufficient pre-production quiet time to test your HA solution. I call it the "pseudo sledge hammer" testing period. Have you ever taken a drive out of your RAID and replaced it to see how long it takes for resilvering and what happens to I/O performance? How much time does it take to test the entire HA implementation and how much time will you be given? The trouble is that you get all this expensive equipment in the data center and install Oracle then damagement is anxious to get the entire application up and running ASAP and asks you to take short cuts or just trust that everything will work. But really you haven't finished the job until you've reasonably tested everything end to end.

IMHO,
Steve Orr

-----Original Message-----
From: root_at_fatcity.com [mailto:root_at_fatcity.com]On Behalf Of Mohan, Ross Sent: Monday, October 16, 2000 7:06 AM
To: Multiple recipients of list ORACLE-L Subject: SAC NORAD and Dustin Hoffman. WAS: HA HA HA on your 24/7 systems

That's why they say that SAC/NORAD ( Strategic Air Command HQ, North American Defense ) buried deep into a mountain in Colorado is a "single point of failure" for the US NationalDefense:

All it takes is a direct hit by one nuclear bomb to bring down the whole facility! :-)

In the words of the Marathon Man's tormentor:

"Is it safe?"

-----Original Message-----
Sent: Friday, October 13, 2000 7:45 PM
To: Multiple recipients of list ORACLE-L

Sorry Ross. Yes I am familiar with enterprise class storage systems.

It still isn't HA.

It only takes one bumbling SA ( or DBA ) to bring the system down, one neanderthalic techie in the computer room to push the 'OFF' switch.

Simultaneous failure of both of the controllers for an array, or of enough disks to bring the array down are not unheard of.

Jared

On Fri, 13 Oct 2000, Mohan, Ross wrote:

> I have to say this "disk is a single point of failure"
> is jangling to the cognitive logic subsystem.
>
> Why?
>
> Well, the disk farms i have seen have redundant controllers,
> with redundant channels, TRIPLE power supplies, at least a
> single mirror with dual porting. There's your "single" disk
> point of failure for you.
>
> Now, try this: Take your two "redundant" nodes....put them
> in a really really big rack and then inside ONE big box. <G>
>
> Are the two nodes ( which now have at least redundant CPUs,
> power supplies, etc. ) a "single point of failure"?
>
> Come on, guys, if you've worked with this stuff a bunch you know:
>
> (a) properly configured diskfarms have a great MTBF, better
> than the other hardware, and
> (b) to REALLY answer Mary's class of questions, you need to
> calculate MTBFs and MTTRs.
>
> The rest is armchair clustering!
>
> hope this pertains,
>
> Ross Mohan
>
> p.s. HA is the latest marketspeak for "failover" or "redundant" or
> whatever...
> please try to browse a copy of "In Search of Clusters" by Gregory Pfister
> from
> IBM. It's a cult classic, a helluva fun read, and one of the best
> thought-out
> technical books i have ever seen, period.
>
>
> -----Original Message-----
> Sent: Thursday, October 12, 2000 2:00 PM
> To: Multiple recipients of list ORACLE-L
>
>
>
> Mary,
>
> OPS is not an HA solution. While you may still have
> an instance running if a node goes down, the storage
> medium is still a single point of failure.
>
> Jared
>
> On Thu, 12 Oct 2000, Ruiz, Mary A (CAP, CDI) wrote:
>
> > I need a little advice. We have a fairly new (< 1 year) 8.1.5 instance
> to
> > support my company's internet business. We recently changed our network
> > solutions provider and now my management wants to achieve a higher level
> of
> > redundancy than it currently does with mirrored disks. The solution
being
> > proposed by my Sysadmin is an Oracle Parallel Server solution. Some
> > background is in order here - we have always shut our databases down at
> > night for backups. I am not highly skilled in backup and recovery
> although
> > I tried some of the hot backup techniques from this list and was able to
> > recover successfully to another server. I noticed that the course
offered
> > by Oracle in OPS has backup and recovery as well as performance tuning
as
> > pre-requisites, which indicates to me that OPS could be extremely
> > challenging. Also, I have read mainly unfavorable comments about OPS
from
> > this list, but most of those comments were based on the Oracle 7
> > implementations (High administrative costs, difficult to implement,
etc.).
>
> >
> > Have things improved with Oracle 8i ? Is OPS worth pursuing? Or should
I
> > convince my management that extra $$ spent in, say, a hot standby
database
> > is well worth it? Is there any other solution that would not involve a
> > second set of disks, rather a second database on the same set of disks
??
> >
> > Thanks in advance,
> > Mary Ruiz / Atlanta
Received on Mon Oct 16 2000 - 11:31:09 CDT