Re: Storage array advice anyone?

From: <chris_at_thedunscombes.f2s.com>
Date: Tue, 14 Dec 2004 10:47:20 +0000
Message-ID: <1103021240.41bec4b82ec39@webmail.freedom2surf.net>

Stephen,

This is a classic debate / argument that's been going on for in one form or other for years. Anyway for what it's worth here's my "two penny worth" or 2 cents if you prefer.

Last year I was involved in setting up an IBM ESS 8000 "Shark" which had 80 - 146 GB drives, so similar but not quite the capacity of your array. We had to go through a number of decisions:

RAID 5 or 10 ? (I've heard some people say that on the Shark RAID5 is actually RAID4 but Id on't want to go there now). There are the usual trade-offs: RAID 5 gives more capacity RAID 10 gives better protection against disk failures, although with hot spares etc. you'd have to be very, very unlucky to suffer data loss using RAID 5

On performance RAID 10 is generally better but it depends on things such as read/write ratio, does the RAID 10 implementation use both plexes for reading or does it only read from the primary etc. RAID 5 suffers when there's been a disk failure especially when it's re-building the disk using the hot-spare.

In our situation RAID 5 was choosen due to price and capacity requirements. Also with the nature of our Oracle databases the performance benefits of RAID 10 were likely to be marginal except when recovering from a disk failure.

2. Striping etc

The first question you need to ask is:
"Do I have different workloads e.g. dev, live, performance critical databases etc ?"
If you do, which is likely, then you need to decide if you want to segregate the workloads / databases onto separate groups of disks to avoid any performance contention (at the disk level, you can't avoid it at the cache level) between the workloads / databases. James Morle has written an excellent paper on this "Sane SAN", it should be available on his website scaleabilities.co.uk

In our case we affectively had a single critical workload (a group of databases and flat files). When this workload was running nothing else would be running. So to maximise performance we did the following:

Divided each disk group (set of 8 disks as a RAID 5 set) into 20 GB LUNs i.e. "disks" / Physical volumes (PVs) from the OS view.
Created volume groups made up of an equal number of LUNs from each disk group. e.g. VG02 contained 2 LUNs from each of the 10 disk groups so 400 GB.
Created filesystems from these volumes that were striped with a 4 MB strip size across all disks in the VG. This was done using "extent based striping" performed by the volume manager (both HP-UX and AIX).

This meant that our critical workload had access to all the phyiscal disks all the time and the IO was evenly spread across all disks, hence maximising performance.

If you decide you want to segregate workloads then you need allocate a physical separate group of disks to each workload. Then I'd suggest you stripe across each separate group of disks as shown above. So you might end up with 3 or 4 groups of disks which have their own spearate striping.

When doing this you need to bear in mind what you are going to do when extra capacity is added in a year or two's time. This can be quite a challenge.

Hope that all made sense.

3. Disk failures

My experience is that with either RAID 5 or 10 you have to be unbelievably unlucky to lose data providing disks are replaced when they fail and not left for a few days or even more. You are talking extremely remote. It might be an idea to get someone to do the maths and work out the probabilities.

Well I hope that helps.

Chris

PS I know about BAARF and in an ideal world we wouldn't use RAID 5 but sometimes when managers are managers and bean counters are counting their beans you can't justify RAID 10 over RAID 5. You just need to make management aware of the trade-offs and understand the implications of the decision.

Quoting Stephen Lee <Stephen.Lee_at_DTAG.Com>:

>
> There is a little debate going on here about how best to setup a new
> system which will consist of IBM pSeries and a Hitachi TagmaStore 9990
> array of 144 146-gig drives (approx. 20 terabytes). One way is to go
> with what I am interpreting is the "normal" way to operate where the
> drives are all aggregated as a big storage farm -- all reads/writes go
> to all drives. The other way is to manually allocate drives for
> specific file systems.
>
> Some around here are inclined to believe the performance specs and
> real-world experience of others that say the best way is keep your hands
> off and let the storage hardware do its thing.
>
> Others want to manually allocate drives for specific file systems.
> Although they might be backing off (albeit reluctantly) on their claims
> that is it required for performance reasons, they still insist that
> segregation is required for fault tolerance. Those opposed to that
> claim insist that the only way (practically speaking) to lose a file
> system is to lose the array hardware itself in which case all is lost
> anyway no matter how the drives were segregated, and if they really
> wanted fault tolerance they would have bought more than one array. And
> around and around the arguments go.
>
> Is there anyone on the list who would like to weigh in with some real
> world experience and knowledge on the subject of using what I suppose is
> a rather beefy, high-performance array.
>
> --
> http://www.freelists.org/webpage/oracle-l
>

Chris Dunscombe

Christallize Ltd

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Dec 14 2004 - 04:46:49 CST