SAME methodology questions...

From: Ron Bergeron <bergeror_at_asitshouldbe.com>
Date: Sun, 08 Jun 2003 03:27:33 GMT
Message-ID: <E2yEa.86017$DV.100771@rwcrnsc52.ops.asp.att.net>

I've been reading about the SAME (Stripe And Mirror Everything) methodology recently. I understand the basic concept and the claim as to why it is a Good Thing. However, I'm unclear as to the actual implementation of it, especially in cases where a hardware RAID array is involved.

The things I've read say to use a 1 MB stripe depth and 1 MB I/O size to stripe across as many physical spindles as possible. Mirror everything to another set of disks. 1 MB of data would be written to a disk before moving on to the next disk. That concept is simple enough.

Assume I'm in a Sun/Veritas/EMC (Clariion)/Oracle environment.

The Clariion has a limitation of striping across a maximum of 8 disks while at the same time mirroring to another set of 8 disks (RAID 1+0). (It can do a stripe across 16 disks, but to both stripe and mirror, it can only go 8 disks wide). For manageability, I'd prefer to strip across sets of 4 disks and mirror to another 4 disks. (More on this later.)

To increase the number of physical spindles, I want to set up several of these 4+4 disk RAID 1+0 sets of disks and then stripe across them using Veritas Volume Manager. Each set of 4+4 disks would be one LUN. In this case, for every 1 "disk" (LUN) that the OS sees, there would actually be 8 physical disks (4-disk wide stripe, mirrored). For example, if I'm using VxVM to stripe across 4 "disks" (actually LUNs), there would actually be 32 physical disks (16 disk stripe mirrored to another 16 disks).

My confusion is with the recommended 1 MB stripe depth. From what I've read, 1 MB was chosen because if a disk is writing 1 MB of data at a time, only one seek is required for that write size. 1 MB per seek can considered a sequential write and is more efficient. Anything larger than 1 MB barely increases the efficiency.

So, is the 1 MB stripe depth at the Volume Manager level or at the physical disk level? Here's why I ask:

If I do an 8 disk RAID 1+0 (4 disk stripe, mirrored) on my hardware RAID array and present that as a single LUN to my server, a 1 MB write to that LUN will result in 256KB written to each physical disk (if I'm striping evenly across the disks for that I/O size). A 256KB write to the physical disk is less efficient than the 1 MB write that all the SAME whitepapers I've read talk about. On the other hand, if I stripe at the hardware level with a 1 MB stripe depth across 4 disks, that results in a 4 MB stripe depth at the VxVM level to maintain the efficency. Neither of those choices seems quite right.

Does anyone else have a similar SAME/Oracle/hardware RAID situation? What have you done and how has it worked out?

I'm really not interested in a religious debate about the the merits (or lack thereof) of the SAME methodology. I'm just looking for people who have done something similar to what I've described.

The reason I would go with the 4-disk wide stripe (8-disk RAID 1+0) is to simplify adding disks in the future. I would add disks one 4x4 set at a time and then do a background relayout at the VxVM level to maintain the SAME principle. The 8-disk (4x4) increment would be simpler and less expensive that going with a 16-disk (8x8) increment.

Finally, should this whole set of disks be one gigantic volume or should I have a datafile volume, a redo volume, and index volume, etc? Separate volumes would be better from an organizational/administrative point of view, but what about performance? Having everything on separate volumes would guarantee longer seek times as the heads move between volumes on these super-wide stripes. In the long run, I suspect that it really won't matter much.

It shouldn't matter, but this if for a multi-terabyte data warehouse that is receiving large amounts of new data hourly as well as crunching what is already in there.

Thanks for the patience it took to get this far.

Ron Received on Sat Jun 07 2003 - 22:27:33 CDT