RAID overview

From: louis.avrami..jr <lou2_at_cbnews.att.com>
Date: Mon, 21 Nov 1994 23:26:12 GMT
Message-ID: <Czn53o.1sF_at_nntpa.cb.att.com>


In article <3ag31d$611_at_nova.sti.nasa.gov>, Karen Huguley <khuguley_at_sti.nasa.gov> wrote:
>In article <Cz29Bw.HwL_at_nntpa.cb.att.com>, lou2_at_cbnews.att.com
>(louis.avrami..jr) says:
>>>snip<<
>>
>>If anyone is interested, I'll post a synopsis of RAID issues that I wrote
>>for my group.
>>
>>Lou Avrami ( attmail!lavrami )
>>
 

>>>>snip snip<<<
>
>Thanks for the info. I would be interested in reading your synopsis of
>RAID issues. We are installing RAID for use with our ORACLE server, and
>I am very interested in others' experiences. What level is best for
>ORACLE7? 0,1,3,5?? Tradeoffs?? Any insights/information would be greatly
>appreciated.
>
>Karen Huguley
>DBA
>NASA Center for Aerospace Information

Several folks indicated interest in an overview of RAID that I wrote several months ago, so I thought that I would post it for everyone. My group ended up going with RAID 5. RAID 5 is good for reads, slow for writes, but my system is a little bit unusual in the the majority of our transactions (>80%) are reads. We do most of our database updates at night, so our users are not affected by the somewhat slower write performance of RAID 5.

We initially chose RAID 1, but found out that our RAID cabinet could only mirror disks within the same rank. Since the cabinet (the NCR 6298) had 5 disks per rank, we would have to "give up" (not use within a RAID array) one disk per rank. Because of the unique aspects of our application with respect to writes, we decided to make to do with RAID 5.

I've always thought that the best overall performance for a database application could be gained with RAID 1, with perhaps striping across both disks (a combination of RAID 1 and 0, called RAID 10). This can be very expensive, especially if you want a totally redundant system. You need two of everything (disks, controllers, etc.).

The RAID Advisory Board can be contacted at 13 Marie Lane, St. Peter, MN 56082-9423, tel (507)931-0967, fax (507)931-0976, MCI Mail ID: 470-6032.

Any opinions stated are mine, not my company's.

Hope this is helpful. If anyone would like to ask specific questions, please E-mail me.

Lou Avrami ( attmail!lavrami )

WHAT IS RAID?


        The term RAID originally stood for Redundant Array of Inexpensive Disks. The RAID Advisory Board officially defines RAID as Redundant Array of Independent Disks. I guess they recognized that these systems weren't as cheap as they thought.

       RAID is a collection of hard disks electronically tied together in a single system that makes absolutely sure that if a single disk drive crashes, that the contents of that disk are protected. There are six basic levels levels of RAID, 0 through 5. A higher number does not mean that the RAID performance is higher, it simply is a different strategy for accomplishing data storage and protection.

        The RAID Advisory Board defines RAID as "a disk array in which part of the storage capacity is used to store redundant information about user data stored on the remainder of the storage capacity. The redundant information enables REGENERATION of user data in the event that one of the array's member disks of the access path to it fails."

FAULT TOLERANCE


        Before going into the different RAID levels, the issue of failed components/data availability/downtime should be discussed. There are four different ways in which a failed component (in our case, a failed disk or controller) can be replaced:

        Cold Swap       System operation must stop, and electrical power must
                        be removed before the component is replaced.


        Warm Swap       System operation must stop, but electrical power does
                        not have to be removed before component replacement
                        takes place.

       Hot Swap        System operation does not need to stop, but human
                        intervention is required for component replacement.

       Automatic Swap  If a component can be hot swapped, and a replacement
                       component is pre-installed so that no human intervention
                       is needed to bring it into service.  Automatic swapping
                       is common with RAID Level 1, 3, 4 and 5 arrays.

        The level of data availability that a system requires is important to
determine. It could have an impact of the price of a RAID system, how that system performs, as well as how long a system may be unavailable to users if there is a failure.

HARDWARE/SOFTWARE ISSUES


        When implementing RAID, different combinations of hardware and software solutions can be utilized:

        Software-based  The least expensive solution.  This approach steals
            array       CPU cycles from the host computer to do the array
                        processing.  Depending on the host's load, this may or
                        may not affect performance.

        Bus-based       A co-processor is installed on the server to handle
            array       array processing.  Multi-processor platforms may
                        dedicate one or more processors to RAID.

        Hardware-based  An array processor is located in the disk array storage
            array       system.

        A hardware-based array ensures the least impact on the system if a
drive fails; the software approach offers the least protection.

        Another hardware issue is the reliability of the disk drives and the controllers. The MTBF (Mean Time Between Failure) rate of the disks and the controllers is very important. The higher the number, in hours, the better.

        The controllers used by the RAID system are an important factor. Today, the best performance is offered by controllers that meet the SCSI-2 Fast-Wide protocols. Also, the best-performing controllers are those with a powerful on-board CPU.

BASIC RAID LEVELS


       As I said earlier, there are six basic levels of RAID. Two of them, RAID 2 and RAID 4, are usually not used in commercial applications. RAID 2 uses bit-interleaving, which breaks data down into bit-sized blocks and distributes it across the different disk drives. Additional disks perform error detection tasks. It takes a lot of disks to perform the error checking, and a lot of computer power. For expensive applications, like oil exploration on a Cray supercomputer, RAID 2 makes sense. It's just too many disks, and too expensive for normal business applications. RAID 4 uses a parity drive, which stores the error-checking that takes up many drives in RAID 2. This parity drive becomes a major bottleneck when performing write operations, since all writes have to go through this parity drive. As a result, RAID 4 arrays are generally not available. The RAID Advisory Board states that "application requirements that RAID Level 4 serves ... are unconditionally better served by RAID Level 5."

        So, that leaves RAID Levels 0, 1, 3 and 5:

        RAID 0  Technically, RAID 0 is not really RAID, because it does not
                provide any data redundancy.  RAID 0 is an array which stripes
                data across a number of disks while treating them all as a
                single unit.  If a disk fails, data can rarely be retrieved
                since it has been spread across several disks, and there is no
                redundancy of data.  Disk striping supposedly increases I/O
                performance.

        RAID 1  Disk drives are mirrored; identical data is written to two
                identical drives so that each piece data is stored twice.
                RAID 1 can read data faster than a single disk because a read
                request can be handled by either drive, or two requests can be
                serviced concurrently - whichever read head is closer to the
                data reads it.  Data must be written twice, to each disk, but
                if the system takes this into account when it is designed, the
                impact can be minimized.  RAID 1 offers very high data
                reliability and improved performance for read-intensive
                applications.  However, it does not offer total system fault
                tolerance because the disk drives are only part of the system.
                Also, it can be expensive, since storage requirements are
                doubled (2x of disk space is needed to store 1x of data).

        RAID 3  RAID Level 3 is similar to RAID 4; there are a number of data
                drives, and a dedicated parity drive which does error-checking.
                RAID 3 interleaves data on a bit-by-bit basis among the data
                drives, whereas RAID 4 is block-by-block.  The parity drive,
                where the data recovery information and the error-checking is
                stored, is an I/O bottleneck, as in RAID 4.  RAID 3 is very 
		good for data transfer-intensive applications, such as CAD/CAM 
		or engineering on a LAN.  The RAID Advisory Boards states that
                "RAID Level 3 is not well suited for transaction processing
		or other I/O request-intensive applications unless it is
                assisted by some other technology such as caching."

        RAID 5  The parity drive of RAID 3 and RAID 4 are eliminated.  Data is
                written by the controller one block at a time, and the parity
                information is interleaved among all of the disks, thereby
                gaining performance and redundancy.  However, RAID 5 suffers
                a write penalty, since the parity information must be changed
                for updates.  RAID 5 arrays are often supplemented with large
                memory caches or write logging to improve write performance.
                On the plus side, RAID 5 reduces I/O bottlenecks since there
                is no dedicated parity drive.  One magazine article that I used
                for this report made an important point: "A common myth
                associated with RAID is that it always increases performance.
                In all cases, access time is increased because of the extra
                disk sets required.  Because of its interleaved parity, RAID 5
                is perhaps the best-performing high-level RAID technique."
                The following is the characterization RAID 5 by the RAID
                Advisory Board:


"RAID Level 5 arrays perform best in applications whose data and I/O characteristics match their capabilities:

  • data whose enhanced availability is worth protecting, but for which the value of full disk mirroring is questionable.
  • high read request rates,
  • small percentage of writes in the I/O load.

Inquiry-type transaction processing, group office automation, on-line customer service departments, etc. are all examples of applications where RAID Level 5 can be used effectively. High-speed data collection from a process, or credit bureau verification in which balances are continually updated are examples of applications unsuited for RAID Level 5."

OTHER RAID LEVELS


        RAID levels 1 through 5 currently guard against single disk failure, but not multiple disk failure. RAID Level 6 addresses this issue, as well as improving I/O performance further. RAID 6 has only been described in papers, however. There have not been any commercial implementations (3/94). RAID 0 and RAID 1 are sometimes combined, to bring together the mirroring of RAID 1 and the "performance" of RAID 0. This combination is sometimes called RAID 10. RAID 53 is a combination of RAID 0 and RAID 3, providing striping and high data-transfer capacity.

HOW TO CHOOSE


        One magazine article that I have listed the following chart:

                        How to Choose a RAID Level
                        --------------------------

                        Cost            RAID Solution
                        ----            -------------

                    Less than 4 GB      RAID 1 (mirroring)

                    Greater than 4 GB   RAID 5


                   Performance                  RAID Solution
                   -----------                  -------------

                Speed, throughput more          RAID 0 (disk striping, no
                important than reliability      fault tolerance)

                High data transfer              RAID 3
                requirement

                High I/O performance and        Combination of RAID 0 and 1
                high availability               (disk striping across RAID 0
                                                mirrored drives, sometimes
                                                known as RAID 10)

                Symmetrical data transfer       Combination of RAID 0 and 3
                and I/0 rate performance        (disk striping across RAID 3
                                                "member disks")


        What ALL of the reports and articles say definitively is that the only
way to match a RAID level and system to an application is to do an on-site, real-world evaluation.

SCALABILITY


        The maximum configuration for each RAID solution is an important consideration.

OTHER SOLUTIONS


        There are other mass-storage strategies available, such as Hierarchical Storage Management (HSM), or Network File System (NFS) Mirroring. RAID is the most popular solution right now in the marketplace. I'm not sure if that is an endorsement, or just a fad.

QUESTIONS TO ASK


        As kind of a summary, I will try to put together some questions to be asked to ourselves and to whomever may provide a RAID solution:

        What level of fault tolerance does the system require?

        How is the RAID array implemented (software or hardware)?
        What is the impact on the rest of the system (CPU)?

        What is the MTBF (Mean Time Between Failure) for the RAID disks?
        For the RAID controllers?  What is the overall MTBF?

        What kind of SCSI controllers does the system use?  SCSI-2?

        What is the maximum configuration of the RAID solution?  Can it handle
        the growth of the system?

        Is the performance of the RAID array affected by the number of drives?

        Is system performance affected when the drive is fully loaded?


        And all of these are minor compared to the biggie;  which RAID Level
        is best suited to the system?
Received on Tue Nov 22 1994 - 00:26:12 CET

Original text of this message