RE: Write cache for a SAN

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Mon, 3 Nov 2008 15:32:53 -0500
Message-ID: <94A3AC4F4B7D4ACBB14FBD89AF02D644@rsiz.com>


The proper response (from the SAN) is to immediately begin real time sweeping of cache to disk when it is down to one power supply for the cache or the on board battery life remaining dwindles below the time estimated to flush cache contents to disk. The self diagnostics on the SAN should prevent controlled flight into terrain. I won't do database work with cache/flash boards that don't meet this criteria (and for those the disks and batteries should be onboard so the writes finish even if some jerk yanks the wrong board.) and you should demand as much from the SAN vendor(s).  

Then you get the write speed when all is well, but you get safety and somewhat degraded performance when there is trouble. Now I suppose someone could pull the wrong board and pry two independent sets of batteries off the board or stomp on the two on board disks, but that is really pretty unreasonable.  

Sorry about your SANtastrophe! Sounds truly horrifying.  

mwf  


From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bobak, Mark
Sent: Monday, November 03, 2008 2:29 PM
To: jkstill_at_gmail.com; dofreeman_at_state.pa.us Cc: Oracle-L Freelists
Subject: RE: Write cache for a SAN  

Jared,  

We actually suffered a "SANtastrophe" about three years ago, in a real life circumstance that was quite similar to the one you describe.  

The way the story was told to me, the SAN frame has a redundant power supply, and one of the two power supplies went bad, and the unit phoned home. EMC tech shows up w/ replacement power supply in hand, a few hours later. (This is the way it's supposed to work, right?) Well, apparently, he pulled the good power supply, rather than the bad power supply..leaving the frame to come crashing down w/ no power...oops...  

36 hours, and about 2.5TB restored from tape later, and we were back in business.  

This really is the exception, though, and not the rule. I don't think disabling write caching is the right move. Ultimately, at some point, you have to have some faith in the hardware you've got and the people you work with.  

-Mark    

--
Mark J. Bobak
Senior Database Administrator, System & Product Technologies
ProQuest
789 E. Eisenhower, Parkway, P.O. Box 1346
Ann Arbor MI 48106-1346
+1.734.997.4059  or +1.800.521.0600 x 4059

<mailto:mark.bobak_at_il.proquest.com> mark.bobak_at_proquest.com
<http://www.proquest.com> www.proquest.com
<http://www.csa.com> www.csa.com
ProQuest...Start here. From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jared Still Sent: Monday, November 03, 2008 2:09 PM To: dofreeman_at_state.pa.us Cc: Oracle-L Freelists Subject: Re: Write cache for a SAN On Mon, Nov 3, 2008 at 7:19 AM, Freeman, Donald <dofreeman_at_state.pa.us> wrote: We don't disable it. The SAN manufacturers have made this as bullet proof as possible. Guess what happens when the storage vendor sends out a tech to replace batteries (the batteries that ensure the write cache stays put), and the tech can't be bothered to follow instructions? What do you think might happen to the write cache? I'm not saying that the write cache should be disabled, but you need to ensure that the folks that maintain the HW actually know what they are doing. Jared Still Certifiable Oracle DBA and Part Time Perl Evangelist -- http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 03 2008 - 14:32:53 CST

Original text of this message