RAID overview
Date: Mon, 21 Nov 1994 23:26:12 GMT
Message-ID: <Czn53o.1sF_at_nntpa.cb.att.com>
In article <3ag31d$611_at_nova.sti.nasa.gov>,
Karen Huguley <khuguley_at_sti.nasa.gov> wrote:
>In article <Cz29Bw.HwL_at_nntpa.cb.att.com>, lou2_at_cbnews.att.com
>(louis.avrami..jr) says:
>>>snip<<
>>
>>If anyone is interested, I'll post a synopsis of RAID issues that I wrote
>>for my group.
>>
>>Lou Avrami ( attmail!lavrami )
>>
>>>>snip snip<<<
>
>Thanks for the info. I would be interested in reading your synopsis of
>RAID issues. We are installing RAID for use with our ORACLE server, and
>I am very interested in others' experiences. What level is best for
>ORACLE7? 0,1,3,5?? Tradeoffs?? Any insights/information would be greatly
>appreciated.
>
>Karen Huguley
>DBA
>NASA Center for Aerospace Information
Several folks indicated interest in an overview of RAID that I wrote several months ago, so I thought that I would post it for everyone. My group ended up going with RAID 5. RAID 5 is good for reads, slow for writes, but my system is a little bit unusual in the the majority of our transactions (>80%) are reads. We do most of our database updates at night, so our users are not affected by the somewhat slower write performance of RAID 5.
We initially chose RAID 1, but found out that our RAID cabinet could only mirror disks within the same rank. Since the cabinet (the NCR 6298) had 5 disks per rank, we would have to "give up" (not use within a RAID array) one disk per rank. Because of the unique aspects of our application with respect to writes, we decided to make to do with RAID 5.
I've always thought that the best overall performance for a database application could be gained with RAID 1, with perhaps striping across both disks (a combination of RAID 1 and 0, called RAID 10). This can be very expensive, especially if you want a totally redundant system. You need two of everything (disks, controllers, etc.).
The RAID Advisory Board can be contacted at 13 Marie Lane, St. Peter, MN 56082-9423, tel (507)931-0967, fax (507)931-0976, MCI Mail ID: 470-6032.
Any opinions stated are mine, not my company's.
Hope this is helpful. If anyone would like to ask specific questions, please E-mail me.
Lou Avrami ( attmail!lavrami )
WHAT IS RAID?
The term RAID originally stood for Redundant Array of Inexpensive Disks. The RAID Advisory Board officially defines RAID as Redundant Array of Independent Disks. I guess they recognized that these systems weren't as cheap as they thought.
RAID is a collection of hard disks electronically tied together in a single system that makes absolutely sure that if a single disk drive crashes, that the contents of that disk are protected. There are six basic levels levels of RAID, 0 through 5. A higher number does not mean that the RAID performance is higher, it simply is a different strategy for accomplishing data storage and protection.
The RAID Advisory Board defines RAID as "a disk array in which part of the storage capacity is used to store redundant information about user data stored on the remainder of the storage capacity. The redundant information enables REGENERATION of user data in the event that one of the array's member disks of the access path to it fails."
FAULT TOLERANCE
Before going into the different RAID levels, the issue of failed components/data availability/downtime should be discussed. There are four different ways in which a failed component (in our case, a failed disk or controller) can be replaced:
Cold Swap System operation must stop, and electrical power must be removed before the component is replaced. Warm Swap System operation must stop, but electrical power does not have to be removed before component replacement takes place. Hot Swap System operation does not need to stop, but human intervention is required for component replacement. Automatic Swap If a component can be hot swapped, and a replacement component is pre-installed so that no human intervention is needed to bring it into service. Automatic swapping is common with RAID Level 1, 3, 4 and 5 arrays. The level of data availability that a system requires is important todetermine. It could have an impact of the price of a RAID system, how that system performs, as well as how long a system may be unavailable to users if there is a failure.
HARDWARE/SOFTWARE ISSUES
When implementing RAID, different combinations of hardware and software solutions can be utilized:
Software-based The least expensive solution. This approach steals array CPU cycles from the host computer to do the array processing. Depending on the host's load, this may or may not affect performance. Bus-based A co-processor is installed on the server to handle array array processing. Multi-processor platforms may dedicate one or more processors to RAID. Hardware-based An array processor is located in the disk array storage array system. A hardware-based array ensures the least impact on the system if adrive fails; the software approach offers the least protection.
Another hardware issue is the reliability of the disk drives and the controllers. The MTBF (Mean Time Between Failure) rate of the disks and the controllers is very important. The higher the number, in hours, the better.
The controllers used by the RAID system are an important factor. Today, the best performance is offered by controllers that meet the SCSI-2 Fast-Wide protocols. Also, the best-performing controllers are those with a powerful on-board CPU.
BASIC RAID LEVELS
As I said earlier, there are six basic levels of RAID. Two of them, RAID 2 and RAID 4, are usually not used in commercial applications. RAID 2 uses bit-interleaving, which breaks data down into bit-sized blocks and distributes it across the different disk drives. Additional disks perform error detection tasks. It takes a lot of disks to perform the error checking, and a lot of computer power. For expensive applications, like oil exploration on a Cray supercomputer, RAID 2 makes sense. It's just too many disks, and too expensive for normal business applications. RAID 4 uses a parity drive, which stores the error-checking that takes up many drives in RAID 2. This parity drive becomes a major bottleneck when performing write operations, since all writes have to go through this parity drive. As a result, RAID 4 arrays are generally not available. The RAID Advisory Board states that "application requirements that RAID Level 4 serves ... are unconditionally better served by RAID Level 5."
So, that leaves RAID Levels 0, 1, 3 and 5:
RAID 0 Technically, RAID 0 is not really RAID, because it does not provide any data redundancy. RAID 0 is an array which stripes data across a number of disks while treating them all as a single unit. If a disk fails, data can rarely be retrieved since it has been spread across several disks, and there is no redundancy of data. Disk striping supposedly increases I/O performance. RAID 1 Disk drives are mirrored; identical data is written to two identical drives so that each piece data is stored twice. RAID 1 can read data faster than a single disk because a read request can be handled by either drive, or two requests can be serviced concurrently - whichever read head is closer to the data reads it. Data must be written twice, to each disk, but if the system takes this into account when it is designed, the impact can be minimized. RAID 1 offers very high data reliability and improved performance for read-intensive applications. However, it does not offer total system fault tolerance because the disk drives are only part of the system. Also, it can be expensive, since storage requirements are doubled (2x of disk space is needed to store 1x of data). RAID 3 RAID Level 3 is similar to RAID 4; there are a number of data drives, and a dedicated parity drive which does error-checking. RAID 3 interleaves data on a bit-by-bit basis among the data drives, whereas RAID 4 is block-by-block. The parity drive, where the data recovery information and the error-checking is stored, is an I/O bottleneck, as in RAID 4. RAID 3 is very good for data transfer-intensive applications, such as CAD/CAM or engineering on a LAN. The RAID Advisory Boards states that "RAID Level 3 is not well suited for transaction processing or other I/O request-intensive applications unless it is assisted by some other technology such as caching." RAID 5 The parity drive of RAID 3 and RAID 4 are eliminated. Data is written by the controller one block at a time, and the parity information is interleaved among all of the disks, thereby gaining performance and redundancy. However, RAID 5 suffers a write penalty, since the parity information must be changed for updates. RAID 5 arrays are often supplemented with large memory caches or write logging to improve write performance. On the plus side, RAID 5 reduces I/O bottlenecks since there is no dedicated parity drive. One magazine article that I used for this report made an important point: "A common myth associated with RAID is that it always increases performance. In all cases, access time is increased because of the extra disk sets required. Because of its interleaved parity, RAID 5 is perhaps the best-performing high-level RAID technique." The following is the characterization RAID 5 by the RAID Advisory Board:
"RAID Level 5 arrays perform best in applications whose data and I/O characteristics match their capabilities:
- data whose enhanced availability is worth protecting, but for which the value of full disk mirroring is questionable.
- high read request rates,
- small percentage of writes in the I/O load.
Inquiry-type transaction processing, group office automation, on-line customer service departments, etc. are all examples of applications where RAID Level 5 can be used effectively. High-speed data collection from a process, or credit bureau verification in which balances are continually updated are examples of applications unsuited for RAID Level 5."
OTHER RAID LEVELS
RAID levels 1 through 5 currently guard against single disk failure, but not multiple disk failure. RAID Level 6 addresses this issue, as well as improving I/O performance further. RAID 6 has only been described in papers, however. There have not been any commercial implementations (3/94). RAID 0 and RAID 1 are sometimes combined, to bring together the mirroring of RAID 1 and the "performance" of RAID 0. This combination is sometimes called RAID 10. RAID 53 is a combination of RAID 0 and RAID 3, providing striping and high data-transfer capacity.
HOW TO CHOOSE
One magazine article that I have listed the following chart:
How to Choose a RAID Level -------------------------- Cost RAID Solution ---- ------------- Less than 4 GB RAID 1 (mirroring) Greater than 4 GB RAID 5 Performance RAID Solution ----------- ------------- Speed, throughput more RAID 0 (disk striping, no important than reliability fault tolerance) High data transfer RAID 3 requirement High I/O performance and Combination of RAID 0 and 1 high availability (disk striping across RAID 0 mirrored drives, sometimes known as RAID 10) Symmetrical data transfer Combination of RAID 0 and 3 and I/0 rate performance (disk striping across RAID 3 "member disks") What ALL of the reports and articles say definitively is that the onlyway to match a RAID level and system to an application is to do an on-site, real-world evaluation.
SCALABILITY
The maximum configuration for each RAID solution is an important consideration.
OTHER SOLUTIONS
There are other mass-storage strategies available, such as Hierarchical Storage Management (HSM), or Network File System (NFS) Mirroring. RAID is the most popular solution right now in the marketplace. I'm not sure if that is an endorsement, or just a fad.
QUESTIONS TO ASK
As kind of a summary, I will try to put together some questions to be asked to ourselves and to whomever may provide a RAID solution:
What level of fault tolerance does the system require?
How is the RAID array implemented (software or hardware)? What is the impact on the rest of the system (CPU)? What is the MTBF (Mean Time Between Failure) for the RAID disks? For the RAID controllers? What is the overall MTBF? What kind of SCSI controllers does the system use? SCSI-2? What is the maximum configuration of the RAID solution? Can it handle the growth of the system? Is the performance of the RAID array affected by the number of drives? Is system performance affected when the drive is fully loaded? And all of these are minor compared to the biggie; which RAID Level is best suited to the system?Received on Tue Nov 22 1994 - 00:26:12 CET