Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Recovery from RAID5 Failure Questions

Re: Recovery from RAID5 Failure Questions

From: Dave Costa <dave_at_cswv.com>
Date: 1997/01/23
Message-ID: <32E79886.3660@cswv.com>#1/1

Trevor Williams wrote:
>
> I would appreciate hearing about DBAs' experiences with handling disk
> problems where a failing RAID5 disk holds database files. In particular,
> I am interested in RAID5 on Solaris.

I'm using RAID5 with OpenVMS, but I don't think the OS should really affect the answers to
these questions.  

> Do databases continue to function after a single disk failure? How well?

As long as your RAID controller properly implements RAID5 they do. I don't have any benchmark information about how much the DB slowed down, but
it wasn't very noticable.  

> How trivial is the replacing of the failed disk? How can the outage time
> be minimised?

What you have to do to replace the disk might depend on your controller and its configuration.
When we first got RAID5, the controller was set (either by default or by the vendor) to disallow
hot-swapping, which was unfortunate since that was part of the reason we wanted RAID5!
When a disk failed in that circumstance, it could only be replaced by shutting down the system,
replacing the disk, and running a standalone program to rebuild the failed disk's contents on the
new disk. Then the system could be brought back up. This worked fine, but took at least an hour for a
2 GB disk. Since the controller fills in for the failed disk, you could wait 'til a less busy time of
day to do this; but if another disk fails while you're waiting, you'll have to go to backups.

Once we were able to enable hot-swapping (which required reinitializing the RAID devices and restoring from
backups), things got much easier. Now, if a disk fails, we simply pop it out and pop in a new one, and the
controller rebuilds it on the fly. Of course, this slows the system down even more, but still the effect is not great.

> After replacing a broken disk and getting Solaris to synchronise
> it, is it necessary to do a database recovery?

It's never been necessary for me. The idea of RAID5 is that the controller maintains the integrity of the RAID device when a single disk fails. Since this is happening in the controller, Oracle and the OS both go happily about their normal business. Once again, the only problem you have to worry about is if a second disk fails before the first one is completely rebuilt. Received on Thu Jan 23 1997 - 00:00:00 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US