RE: How many of you use S.A.M.E?

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 2 Feb 2007 12:54:19 -0500
Message-ID: <FBEIIHEAOIFCBBNIIFOGMEGJCMAA.mwf@rsiz.com>

(I'm not sure how my email is screwing up, but I haven't seen this be delivered, I'll check with an individual later to see if it comes through, no need to busy the list with "I got its"

-----Original Message-----
From: Mark W. Farnham [mailto:mwf_at_rsiz.com] Sent: Thursday, February 01, 2007 6:22 PM To: oracle-l_at_freelists.org
Subject: RE: How many of you use S.A.M.E?

Okay, so there are whole books and many papers good, bad, and ugly on this topic.

Grossly oversimplifying, and remembering that cached mitigates the downside of i/o demand collisions, SAME operates like a statmux, that is every requesting process sticks its straw into a single firehose (or garden hose if you're unlucky) and drinks and blows bubbled in competition with all the other processes and their straws.

I think it was JL who remarked emphatically that he would prefer that if someone else running a database on the same disk farm as him wanted to destroy their own performance, that was okay with him but he would prefer that they could not destroy his performance. Whether that is parceled out as different tablespaces isolated from each other within a single database or multiple databases doesn't matter much for the central bit I'm trying to convey. SAME avoids hot spots and tends to even out the load and that by definition means if one use is making the disk farm go crazy everyone suffers equally. That is neither all good nor all bad.

Let's say you have three databases designed to serve EMEA (Europe, the Middle East, and Africa), AMER (The Americas Region, you know from north of Canada all the way south to that bit that almost reaches Anarctica), and ASIA. If those are peak load oriented to 9AM to 5PM in the local time zones and you smear everything across all the disks evenly like SAME, then you effectively get triple the i/o horsepower for each when you need it. That is the polar case where SAME shines best.

Now let's say you have three applications that don't share data between them but which simultaneously peak in activity (meaning i/o demand in this case). SAME will minimize hot spots, but it will also maximize seek, read, and write collisions. (I guess DBWR will migitate the write collisions somewhat, especially if you segregate the redo destinations from the database files [ignoring SAME in that case]).

What if two of the applications are beating the disk drives to death with batch jobs and one of the applications is trying to service interactive user requests? You lose. SAME applies the pain equally to everyone.

Now I'm not sure what became of a paper by Gary Sharpe that I helped write, but it had the neatest pictures of a big disk farm and how it could quickly become incomprehensible for humans to make good choices (like in your case of 120 disks with 32 slices each) in the assembly of volumes for Oracle (or anything else) to use. By the way, I'm looking for that paper if anyone has a copy with the animated powerpoint. I suppose I could redo the work, but that thing is a piece of art and I wouldn't do it justice. We introduced the concept of "stripe sets", that is if you take some number of those 120 disks and line them up and paint a different color across all the disks on each of those 32 slices, you would be looking at 32 stripes and one stripe set. Which disks and how many disks per stripe set is something you have to determine for a particular disk farm, taking into account all the things that queue on a disk request, redundancy, the most efficient pairwise hardware mirroring of the drives if that is possible, etc. etc. etc..

So then if you look at the picture of the whole disk farm and you want to parcel out storage to different applications or databases it is child's play, almost boring, to allocate a good solution that makes sense.

In general though, when you add storage, the minimum increment tends to be a whole tray full of disks (because you want to clone your stripe sets definitions for ease of use, and if you just stick one drive in instead and add a little piece of it to each stripeset based volume to grow the volume you will immediately produce a hot spot so intense that it has been known to destroy new drives well before their normal mean time between failure). SAME has a protocol for adding single drives, and ASM automates blending in additional storage over time.

It is entirely possible to arrange the Meta Devices to be stripes of a stripeset and then to allocate the Meta Devices from a given stripeset to only one database. This is part of the BORING protocol. You can implement it with disk groups in ASM. If isolation of disk i/o demand is what you want, that is as good a way to do it as any, either with ASM or by hand. For the disk farm interfaces I am aware of, you have to do the book keeping to keep track of which [meta devices, volumes, plexes, plex sets, make up your own new name] are which and which disks comprise them. Using consistent nomenclature can automatically create a virtual notebook, but you have to remember that the volume managers are not going to enforce your nomenclature against your typos.

Arranging things in this BORING way is also conducive to producing good thinking about adding faster media types to an existing disk farm. Oh, BORING is Balanced Organization of Resources in Natural Groups. So if you add some 15Krpm, 256M cache drives to a farm that is previously made of 7.2Krpm, 4M cache drives, don't mix them up in existing stripe sets. Likewise if you add some mirrored (aka duplexed) persistent ram disk devices. Make them be separate stripesets and separate disk groups if you're using ASM.

So you still stripe and mirror everything. Just not all in one piece. And to the extent you are able to divide the use of channels, cache, and disk platters you will isolate (protect) the response time of one application from high use by other applications. Isolating cache usage runs from easy to impossible depending on what the disk array supports. Interestingly enough if you cannot partition cache and your i/o demand underflows what the cache is capable of, then after warmup any old SAME and a perfectly arranged BORING layout will perform the same. (You also won't be asking the question below if your load demand underflows cache capability).

Now, lest someone think I am being unfair to SAME, remember that if you don't actually have a disk performance problem, then some variety of SAME is probably the cheapest to configure and maintain. Also, notice that in the timezone peak load case, with BORING you have less total disk throughput to serve each timezone while the disks for the other time zones sit nearly idle. Of course that might be a good time to run batch jobs and back up the other time zones, but SAME would have made all the horsepower available to each time zone.

BORING takes a bit more configuration or a lot more depending on the technology and tools you have. If you have no idea what the load demands of the different applications or databases will be, then you don't really have a basis for configuring BORING for performance advantage immediately, but if you keep it really boring it will be easy to reallocate. There was a time when it seemed like the vendors assembled the disk farms in the worst possible way at the factory and then you had to pay extra for tool suites to rebuild them sensibly, but I must have been imagining that which I write for legal purposes.

SAME and BORING each have strong points and best use cases. What you seem to indicate you have below may be what I call "HAPHAZARD" for which I have no spelled out words. Autogenerated HAPHAZARD may be okay as long as you never have to look at it and understand it. And you might not have to look at it, except that you seem to think you are having i/o problems, so I guess you do have to look at it.

Finally, if perchance you acquire a disk farm that is 50/50 divided for test and production so that your load simulations in test will be "just like the real thing" make very certain you understand which way it was cut in half before you let someone start a load test after production is in service. If they allocated half the disks and half the channels, etc. to each, you'll be fine. If they cut each disk platter in half by definition of partitions.... you likely won't be fine.

Regards,

mwf

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Feb 02 2007 - 11:54:19 CST