Re: Raid 50

From: Craig I. Hagan <hagan_at_cih.com>
Date: Thu, 8 Jul 2004 07:33:26 -0700 (PDT)
Message-ID: <Pine.LNX.4.58.0407080701590.28493@svr.cih.com>

> are you sure that they aren't using RAID 5 sets with 5
> or 9 members?

you're right, i forgot to add the parity disk when i worked the #disks/set. however the points remain.

note that this still doesn't violate the statements that i made (i had a feeling that i might have been off by one)

Next, your statement talks about reads, which don't have the stripe width problem (just chunk size/individual disk) save when operating in degraded mode and a read is performed against data on the failed disk. Raid5 isn't all that bad for random reads -- it is just that most random read system also come with random writes which you didn't address.

this leaves you with two sets of io possibilities (one if the array's minimum io size is a stripe):

read just the chunk(s) requested if the data being read is less than stripe width and no drives have failed

send io to sub-disk(s), return result

NB: this is comparable to raid1 (one iop per disk)

2) read the entire strip

	if drives have failed:
	read stripe's chunks from surviving subdisks. unless chunk w/ crc 
	has failed, use it to compute missing data

	if no faults
	read strip's chunks from subdisks, return result

	NB: this is also comparable to raid1 (one iop per disk) save in
	degraded mode where you also have a checksum computation.

In both cases at most one iop is being submitted to the subdisks. This important -- and part of why raid5 often has radically different read vs. write performance.

You discussed reads earlier, which is an area that raid5 often does quite well at. Writes can be a different matter. In order to achieve writes the size of stripes is to issue the to the OS either as a single large write, or (for OSes/storage which are smart enough to coalesce) a series of adjacent smaller writes.

When your submitted writes are less than stripe size and are random so no coalescing can be performed (think oltp with blocksize < stripesize), then you will see this:

read stripe in.
modify the 8k region
compute checksum for stripe
write out to disks

This requires two operations against all disks in the set as well as a checksummer computation. This is inferior to raid1 which would have emitted one iop to each disk. This is a major reason why raid5 isn't chosen for truly random io situations unless the sustained writes are below that which can be sunk to disk and the cache can handle the burst workload.

When the submitted write is either coalesced to an integer numer of stripes, then your io pattern looks like this:

checksum data
write stripe

which goes back to one io per subdisk.

This is an area where raid5 tends to do quite well -- often better than raid1 pair because you're splitting the load across more disks (similar # of iops) rather than duplicating it (write speed of raid1 pair == write speed of single disk).

Raid5 is often chosen for streaming read/write applications where the submitted requests (from the array's perspective) is sequential io as raid5 is pretty dang good at it.

This is why I posted in the form of pro/con. The assumption is that folks should have an understanding of their system so that they can guage their io rate and look at a storage solution and choose one which fits both their io, space, reliability, and budget requirements.

craig
Please see the official ORACLE-L FAQ: http://www.orafaq.com
To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line. -- Archives are at http://www.freelists.org/archives/oracle-l/ FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html

Received on Thu Jul 08 2004 - 09:30:35 CDT