Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Recovery from RAID5 Failure Questions
Trevor Williams wrote:
>
> I would appreciate hearing about DBAs' experiences with handling disk
> problems where a failing RAID5 disk holds database files. In particular,
> I am interested in RAID5 on Solaris.
I'm using RAID5 with OpenVMS, but I don't think the OS should really
affect the answers to
these questions.
> Do databases continue to function after a single disk failure? How well?
As long as your RAID controller properly implements RAID5 they do.
I don't have any benchmark information about how much the DB slowed
down, but
it wasn't very noticable.
> How trivial is the replacing of the failed disk? How can the outage time
> be minimised?
What you have to do to replace the disk might depend on your controller
and its configuration.
When we first got RAID5, the controller was set (either by default or by
the vendor) to disallow
hot-swapping, which was unfortunate since that was part of the reason we
wanted RAID5!
When a disk failed in that circumstance, it could only be replaced by
shutting down the system,
replacing the disk, and running a standalone program to rebuild the
failed disk's contents on the
new disk. Then the system could be brought back up. This worked fine,
but took at least an hour for a
2 GB disk. Since the controller fills in for the failed disk, you could
wait 'til a less busy time of
day to do this; but if another disk fails while you're waiting, you'll
have to go to backups.
Once we were able to enable hot-swapping (which required reinitializing
the RAID devices and restoring from
backups), things got much easier. Now, if a disk fails, we simply pop
it out and pop in a new one, and the
controller rebuilds it on the fly. Of course, this slows the system
down even more, but still the effect is not great.
> After replacing a broken disk and getting Solaris to synchronise
> it, is it necessary to do a database recovery?
It's never been necessary for me. The idea of RAID5 is that the controller maintains the integrity of the RAID device when a single disk fails. Since this is happening in the controller, Oracle and the OS both go happily about their normal business. Once again, the only problem you have to worry about is if a second disk fails before the first one is completely rebuilt. Received on Thu Jan 23 1997 - 00:00:00 CST