Re: San & single point of failure

From: Mark Brinsmead <pythianbrinsmead_at_gmail.com>
Date: Mon, 17 Nov 2008 19:34:25 -0700
Message-ID: <cf3341710811171834u52dbbf5fja2360c8c12a0a78@mail.gmail.com>


Absolutely correct, Brad.

In theory, if we were willing to put complete faith in our SAN devices, *and * to put complete faith in our operating systems and software, *and* to put complete faith in the humans configuring and operating them, we wouldn't need any redundancy. Everything would work as it was supposed to, and nothing would ever fail. Unless, of course, our faith was misplaced. :-)

The sad fact is, though, *stuff fails*. Hardware fails, firmware fails, software fails (a lot) and people fail (often even more).

People who use (good) SAN devices rarely suffer data loss these days as the result of *disk* failures. (Rarely, but not "never".)

I have been present at sales presentations where sales reps (and pre-sales engineers who really ought to know better) actually *swore* that their SAN device is "infallible", and that no customer using that particular device had *ever* lost data.

I also have friends who work for storage / backup vendors, and have heard plentiful (first-hand) horror stories of simple hardware or firmware upgrades completely obliterating the entire contents of multi-terabyte disk arrays. Permanently and irretrievably. No human error (provably) involved!

SAN devices have become amazingly good at protecting us from data loss due to failure of a single disk, or sometimes even many disks. But what protects us from failure of the SAN device?

Few things move me closer to tears than to review a customer's systems and find all of the following on the same SAN device:

*  Datafiles
*  (All) Online redologs
*  Archived redo logs
*  (All) Controlfiles
*  (All) Backups.

Don't get me wrong. RAID arrays are great. But we really need to be careful not to trust them *too much*.

On Mon, Nov 17, 2008 at 2:06 PM, Bradd Piontek <piontekdd_at_gmail.com> wrote:

> there are other reasons to multi-plex the controlfile If you only have one,
> you aren't guarded from logical controlfile corruption. OR, say, maybe a dba
> or admin accidentally removes one of your controlfiles.
>
> In theory, provided your SAN adminstrators lay things out correctly, there
> may be something to be said for their redundancy at the hardware level. I've
> seen database be sliced up into /data and /archive. As time has gone on in
> my career, I've asked more questions on the layout and thought about things
> a bit more. I'm not sure there is a clear cut answer, but it definitely does
> 'depend'.
>
> Bradd Piontek
> "Next to doing a good job yourself,
> the greatest joy is in having someone
> else do a first-class job under your
> direction."
> -- William Feather
>
>
> On Mon, Nov 17, 2008 at 2:50 PM, Claudia Zeiler <czeiler_at_ecwise.com>wrote:
>
>> All,
>>
>> I have just been given a new server to put a database on. It is a SAN
>> server, but the apparent layout of drives to me is:
>>
>> /redo1
>>
>> /redo2
>>
>> /big everything_else_disk
>>
>>
>>
>> This means that I have just put control_file1, 2, and 3 all in the same
>> place – on /big. I thought that the whole point of multiple control files
>> was to avoid single points of failure, such as a single location.
>>
>>
>>
>> I am told that SAN layout is to handle mirroring, striping, & hot spots
>> behind the scene and I don't need to worry. If this is true, why do I need
>> duplicates of the control file?
>>
>>
>>
>> Something smells fishy to me. Does anyone else have an opinion?
>>
>>
>>
>> -Claudia
>>
>
>

-- 
Cheers,
-- Mark Brinsmead
  Senior DBA,
  The Pythian Group
  http://www.pythian.com/blogs

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 17 2008 - 20:34:25 CST

Original text of this message