FW: San & single point of failure

From: Herring Dave - dherri <Dave.Herring_at_acxiom.com>
Date: Wed, 19 Nov 2008 06:54:33 -0600
Message-ID: <7ED53A68952D3B4C9540B4EFA5C76E360582C8B6@CWYMSX04.Corp.Acxiom.net>


From this awesome discussion you've all helped me realize that I've got a vulnerability spot with my new servers.  All of them are using ASM, all with 1 big ol' disk group.  I have multiple copies of the controlfile, but they're all on the 1 ASM disk group.  I believe I'll adjust that as soon as I can to put 1 copy on 1 of the available filesystems, so I know for sure that I've got controlfiles on separate LUNs.  Thanks!

Dave



Dave Herring, DBA |   A c x i o m  M I C S / C S O 630-944-4762 office | 630-430-5988 wireless | 630-944-4989 fax

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Mark Brinsmead Sent: Monday, November 17, 2008 8:34 PM
To: piontekdd_at_gmail.com
Cc: czeiler_at_ecwise.com; oracle_l
Subject: Re: San & single point of failure

Absolutely correct, Brad.

In theory, if we were willing to put complete faith in our SAN devices, and to put complete faith in our operating systems and software, and to put complete faith in the humans configuring and operating them, we wouldn't need any redundancy.  Everything would work as it was supposed to, and nothing would ever fail.  Unless, of course, our faith was misplaced.  :-)

The sad fact is, though, stuff fails.  Hardware fails, firmware fails, software fails (a lot) and people fail (often even more).

People who use (good) SAN devices rarely suffer data loss these days as the result of disk failures.  (Rarely, but not "never".)

I have been present at sales presentations where sales reps (and pre-sales engineers who really ought to know better) actually swore that their SAN device is "infallible", and that no customer using that particular device had ever lost data. 

I also have friends who work for storage / backup vendors, and have heard plentiful (first-hand) horror stories of simple hardware or firmware upgrades completely obliterating the entire contents of multi-terabyte disk arrays.  Permanently and irretrievably.  No human error (provably) involved!

SAN devices have become amazingly good at protecting us from data loss due to failure of a single disk, or sometimes even many disks.  But what protects us from failure of the SAN device?

Few things move me closer to tears than to review a customer's systems and find all of the following on the same SAN device:

*  Datafiles
*  (All) Online redologs
*  Archived redo logs
*  (All) Controlfiles
*  (All) Backups.

Don't get me wrong.  RAID arrays are great.  But we really need to be careful not to trust them too much. On Mon, Nov 17, 2008 at 2:06 PM, Bradd Piontek <piontekdd_at_gmail.com> wrote: there are other reasons to multi-plex the controlfile If you only have one, you aren't guarded from logical controlfile corruption. OR, say, maybe a dba or admin accidentally removes one of your controlfiles.

In theory, provided your SAN adminstrators lay things out correctly, there may be something to be said for their redundancy at the hardware level. I've seen database be sliced up into /data and /archive. As time has gone on in my career, I've asked more questions on the layout and thought about things a bit more. I'm not sure there is a clear cut answer, but it definitely does 'depend'.

Bradd Piontek

  "Next to doing a good job yourself, 
        the greatest joy is in having someone 
        else do a first-class job under your  
        direction."
 -- William Feather

On Mon, Nov 17, 2008 at 2:50 PM, Claudia Zeiler <czeiler_at_ecwise.com> wrote: All,
I have just been given a new server to put a database on.  It is a SAN server, but the apparent layout of drives to me is:
/redo1
/redo2
/big    everything_else_disk

 
This means that I have just put control_file1, 2, and 3  all in the same place - on /big.  I thought that the whole point of multiple control files was to avoid single points of failure, such as a single location.  
I am told that SAN layout is to handle mirroring, striping, & hot spots behind the scene and I don't need to worry.  If this is true, why do I need duplicates of the control file?  
Something smells fishy to me.  Does anyone else have an opinion?  
-Claudia
-- 
Cheers,
-- Mark Brinsmead
  Senior DBA,
  The Pythian Group
  http://www.pythian.com/blogs
***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Nov 19 2008 - 06:54:33 CST

Original text of this message