Re: SAME methodology questions...

From: Paul Drake <drak0nian_at_yahoo.com>
Date: 8 Jun 2003 10:01:19 -0700
Message-ID: <1ac7c7b3.0306080901.e6ca5d3@posting.google.com>

Ron Bergeron <bergeror_at_asitshouldbe.com> wrote in message news:<E2yEa.86017$DV.100771_at_rwcrnsc52.ops.asp.att.net>...
> I've been reading about the SAME (Stripe And Mirror Everything)
> methodology recently. I understand the basic concept and the claim as to
> why it is a Good Thing. However, I'm unclear as to the actual
> implementation of it, especially in cases where a hardware RAID array is
> involved.
>
> The things I've read say to use a 1 MB stripe depth and 1 MB I/O size to
> stripe across as many physical spindles as possible. Mirror everything
> to another set of disks. 1 MB of data would be written to a disk before
> moving on to the next disk. That concept is simple enough.
>
> Assume I'm in a Sun/Veritas/EMC (Clariion)/Oracle environment.
>
> The Clariion has a limitation of striping across a maximum of 8 disks
> while at the same time mirroring to another set of 8 disks (RAID 1+0).
> (It can do a stripe across 16 disks, but to both stripe and mirror, it
> can only go 8 disks wide). For manageability, I'd prefer to strip across
> sets of 4 disks and mirror to another 4 disks. (More on this later.)
>
> To increase the number of physical spindles, I want to set up several of
> these 4+4 disk RAID 1+0 sets of disks and then stripe across them using
> Veritas Volume Manager. Each set of 4+4 disks would be one LUN. In this
> case, for every 1 "disk" (LUN) that the OS sees, there would actually be
> 8 physical disks (4-disk wide stripe, mirrored). For example, if I'm
> using VxVM to stripe across 4 "disks" (actually LUNs), there would
> actually be 32 physical disks (16 disk stripe mirrored to another 16 disks).
>
> My confusion is with the recommended 1 MB stripe depth. From what I've
> read, 1 MB was chosen because if a disk is writing 1 MB of data at a
> time, only one seek is required for that write size. 1 MB per seek can
> considered a sequential write and is more efficient. Anything larger
> than 1 MB barely increases the efficiency.
>
> So, is the 1 MB stripe depth at the Volume Manager level or at the
> physical disk level? Here's why I ask:
>
> If I do an 8 disk RAID 1+0 (4 disk stripe, mirrored) on my hardware RAID
> array and present that as a single LUN to my server, a 1 MB write to
> that LUN will result in 256KB written to each physical disk (if I'm
> striping evenly across the disks for that I/O size). A 256KB write to
> the physical disk is less efficient than the 1 MB write that all the
> SAME whitepapers I've read talk about. On the other hand, if I stripe at
> the hardware level with a 1 MB stripe depth across 4 disks, that results
> in a 4 MB stripe depth at the VxVM level to maintain the efficency.
> Neither of those choices seems quite right.
>
> Does anyone else have a similar SAME/Oracle/hardware RAID situation?
> What have you done and how has it worked out?
>
> I'm really not interested in a religious debate about the the merits (or
> lack thereof) of the SAME methodology. I'm just looking for people who
> have done something similar to what I've described.
>
> The reason I would go with the 4-disk wide stripe (8-disk RAID 1+0) is
> to simplify adding disks in the future. I would add disks one 4x4 set
> at a time and then do a background relayout at the VxVM level to
> maintain the SAME principle. The 8-disk (4x4) increment would be simpler
> and less expensive that going with a 16-disk (8x8) increment.
>
> Finally, should this whole set of disks be one gigantic volume or should
> I have a datafile volume, a redo volume, and index volume, etc? Separate
> volumes would be better from an organizational/administrative point of
> view, but what about performance? Having everything on separate volumes
> would guarantee longer seek times as the heads move between volumes on
> these super-wide stripes. In the long run, I suspect that it really
> won't matter much.
>
> It shouldn't matter, but this if for a multi-terabyte data warehouse
> that is receiving large amounts of new data hourly as well as crunching
> what is already in there.
>
> Thanks for the patience it took to get this far.
>
> Ron

Instead of thinking about what type of datafile would be stored on the volume, think about the access pattern of the volume. As this is a warehouse database, where a large degree of parallelism would be used on loads, index builds, or where full table scans will be used - then large stripes make sense with agressive read-ahead configured in the storage controller. Are some of the datafiles accessed in a random pattern, where read-ahead would reduce performance? Some controllers support adaptive read-ahead, where no read-ahead takes place until the controller recognizes that a large read operation is taking place.

I think that you need to review the difference between RAID 01 and RAID 10.
When you say "4 disk wide stripe, then mirrored" - that would seem to be RAID01 to me. One (destructive) way to tell is to pull 2 drives - and see what you lose ;).

RAID 10 would likely support a larger number of smaller accesses better than a RAID 01 layout, but RAID 01 may provide higher throughput for fewer, larger operations (warehouse).

Think in terms of overhead of reading extra blocks (that won't likely be used), vs. being free to service the next (queued) request. If you don't see a large number of queued operations in the Storage controller's management software, then there is likely no benefit in reducing the overhead of the larger stripe size.

Do you know the average write size of the log writer? I'd guess that it is less than the 1 MB stripe size. But this is why they put a large amount of cache in the storage cabinet, right?

I was surpised at how poorly a CX200 performed in random I/O, but how well it performed in an RMAN backup of a 70 GB database (highly sequential) - with a 256KB stripe size, as W2K doesn't support I/Os larger than that.

Pd Received on Sun Jun 08 2003 - 12:01:19 CDT