Re: Storage array advice anyone?

From: Matthew Zito <mzito_at_gridapp.com>
Date: Thu, 16 Dec 2004 01:54:21 -0500
Message-Id: <506DBE1E-4F2F-11D9-8ED0-000393D3B578@gridapp.com>

This is the sort of issue that comes up often on oracle-l - in a nutshell, "Is raid 5 acceptable for database workloads". There's a lot of great writing that has been done on the subject, but the tragedy is that a lot of it is very very old, and could stand a rewrite. I cover some of these topics in my forthcoming storage book from o'reilly (*plug* *plug*), but my overall opinion is that if you can afford the utilization penalty for RAID-10, then you should take it every time.

However, RAID-F is not the terrible thing its made out to be. The things that, in my opinion, have significantly changed the landscape for parity-protected RAID levels:

-RAID-6 (aka RAID-DP) - basically adding an extra parity disk for very large RAID-5 sets to allow you to suffer three disk failures before data loss occurs (that is, data loss occurs on the third disk that dies).
-Virtualization/abstraction of storage objects - when the LUN you are sending I/Os to is comprised of chunks from 50 different spindles from 10 different RAID-5 groups, the performance is excellent. Another example of this is HSM allowing for infrequently used blocks to be "paged" out to RAID-5/6 devices with the high-performance blocks remaining on RAID-10. Yet another example is "third-mirror" or BCV or Shadow Copy (whatever the vendor term of the week is) for a point-in-time copy of your database - but that addresses the recoverability issue, not so much the availability issue -Predictive failure analysis - basically, most drives soft fail and throw errors before they hard fail (head crash, etc.). More modern disk arrays will preemptively bring in a hot spare to replace a drive that has had more than a certain number of errors. Reconstruction occurs directly from the dying disk until it is not responding properly anymore.
-Hardware/ASIC based parity checksumming - performance improvement, plain and simple, due to pipelining and paralellization of parity generation

There's really two arguments that seem to come up against RAID-F:

-Performance - RAID-F is slow
-Availability - RAID-F is inherently less reliable

The performance argument simply doesn't stand up as an absolute anymore. There's three reasons for that - RAID-5 implementations have gotten better, newer technologies like the ones I list above remove a lot of the shortcomings of RAID-5, and storage in general has gotten faster. Many databases that I have seen were very carefully tuned for the specific array, best practices, logs and indexes and datafiles all on separate disks, etc. etc and would have been just as fast had they thrown everything onto one big volume and let the array sort it out. In fact, I see many organizations creating many small storage objects for various performance-driven purposes, when they were getting carved out of the same RAID group, rendering any benefit imaginary at best. RAID-5 may not always be as fast as RAID-10, but often it doesn't need to be. Look at it this way - we'd all like to be running our databases on the biggest iron possible to improve performance, but we're forced to deal with the servers that are acceptable from a budgetary and management perspective. The same is true of storage.

The availability argument is true, though with the above techniques things have been again mitigated with time. The key issue, is, though - what is the desired/required availability for the database? We would all like to have a 24/7 database that's as reliable as possible, but we all make decisions about where to cut corners for availability. Many organizations trust their storage arrays to be redundant from an operating perspective, when most of the truly damaging outages I've seen in my time working with storage were due to array failure having nothing to do with any RAID group configuration. For example, in most fibre channel arrays today, yanking an active disk drive has a reasonable probability of taking down the entire fibre channel loop, killing off up to 128 drives at once. Yet very few organizations mirror across storage arrays online (though many mirror them remotely for DR).

The general argument FOR RAID-10 seems to be, "It's better, and it doesn't really cost THAT much more". The fact remains, it does cost more, and can cost a great deal more than a RAID-F configuration, depending on group sizing. For example, 140 disks in two different configurations - 10 14-drive RAID-10 sets and 10 14-drive RAID-6 sets (two reasonable standard configurations provided by EMC Clariion and Netapp NearStore). With 73GB drives, RAID-10 nets you just a shade over 5TB, while RAID-6 gets you 8.7TB.

The ideal way to look at things is from the business perspective- Is the improved reliability for RAID-10 important enough _for this application_ that it is worth the increased cost? Vet your vendor heavily - if necessary, hire someone impartial to come in and explain to you exactly what the gotchas are going to be with the products you'll be buying. Then figure out what your exposure is going to be - if you run RAID-10, will you be buying another disk array in a year? If you run RAID-5 and you lose two disks, how long will it take to recover?

Again, I'm not defending RAID-F as being as good as RAID-10. I'm simply saying that immediately disregarding RAID-5/F as a waste of time based on old information and preconceptions is like disregarding Linux based on the way it was back in 1998. Times change, and keeping costs down is something that, imho, not enough technology people think about.

Thanks much,
Matt

--
Matthew Zito
GridApp Systems
Email: mzito_at_gridapp.com
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com


On Dec 15, 2004, at 8:36 PM, Joel Garry wrote:
oracle-l_at_freelists.org
>> On Tue, 14 Dec 2004 10:47:20 +0000, chris_at_thedunscombes.f2s.com=20


>>> My experience is that with either RAID 5 or 10 you have to be=20

>>> unbelievably unlucky to lose data providing disks are replaced 

>>> when=20

>>> they fail and not left for a few days or even more. You are 

>>> talking=20

>>> extremely remote. It might be an idea to get someone to do the 

>>> maths=20

>>> and work out the probabilities.

>>
>> I, for one, have been that unlucky on at least one occasion
>
> Me too.  No one was listening to the standby machine 350 miles away,
> going <little fly voice> Help me!  Help me!</little fly voice>
>
> Also, more often, seen what Cary points out, failures happen in 
> clusters
> or dominos.
>
> Joel Garry=20
> http://www.garry.to=20
> =A0
>
>
> --
> http://www.freelists.org/webpage/oracle-l

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Dec 16 2004 - 00:54:33 CST