Re: Hardware RAID vs. Software RAID

From: Howard J. Rogers <hjr_at_dizwell.com>
Date: Tue, 21 Oct 2003 06:43:41 +1000
Message-ID: <3f944981$0$497$afc38c87@news.optusnet.com.au>

Chuck Lucas wrote:

> "Daniel Morgan" <damorgan_at_x.washington.edu> wrote in message
> news:1066587431.204277_at_yasure...

>> You are correct on all points.
>>
>> But what version you implement can make a big difference. I'd stay away
>> from RAID 5 and vendor RAID
>> implementations with numbers like 2 and 4.
>
> What's wrong with RAID 5?

RAID5 takes your data and stripes it across multiple hard disks (so far so good) and then calculates some parity information which is writes onto another hard disk (not good -extra work, extra I/O). The parity information lets you compute the lost contents of any of the other hard disks, and thus protects your data, but the cost of producing it in the first place can be relatively high.

However, that's the least of your problems, since in these days of fast hard disks and battery-backed cache, the write penalty associated with RAID 5 is probably not such an issue as it used to be (in the mid-1990s, for example). Plus, if you only use RAID 5 to house data files, which are written to by DBWR in a deferred way in anycase (ie, a user doesn't have to wait for DBWR to write before being able to move on to the next transaction, but merely has to wait for LGWR), then the concerns on the write penalty are probably over-blown.

However, DBWR *is* stressed by RAID5, having to do several times the number of writes that it would have to do in, say, a RAID0 situation. And if DBWR is slowed down, you can start having wait events in the database, such as 'free buffer waits': if DBWR can't flush the buffer cache quickly, then a backlog of dirty buffers builds up; dirty buffers can't be over-written with new data; when you do a select from a new table, you can't find a clean buffer to use for your data; you have to wait for DBWR to catch up.

That's therefore diagnosable, and there are tuning fixes (such as increasing the number of DBWR processes) so it might or might not be a show-stopper in a particular environment. The possibility of it arising would certainly make me nervous, though.

Another real killer is what happens when one of the disks of your array dies. Sure, the database keeps working, but *everything* slows down to a trickle as the parity information is used to re-construct the missing disk. When you replace the faulty hard disk, the resynchronisation that has to take place involves the same sort of work -I/O against the existing hard disks, computation of the missing data by use of the parity information, writing the re-computed information to the new disk. That's a lot of I/O.

The real no-no on RAID5, though, is having redo logs anywhere near them. If you had a RAID5 volume for the data files, and a separate RAID0 array for your redo logs, I would probably live with it. What tends to happen, however, is that management insists on RAID5 for everything, because it makes data safe, doesn't it? And that will bring LGWR to a grinding halt, and everyone suffers in consequence.

Regards
HJR

-- 
--------------------------------------------
See my brand new website, soon to be full of 
new articles: www.dizwell.com.
Nothing much there yet, but give it time!!
--------------------------------------------

Received on Mon Oct 20 2003 - 15:43:41 CDT