Re: Miserable Disks

From: Mark Brinsmead <pythianbrinsmead_at_gmail.com>
Date: Thu, 25 May 2006 20:55:15 -0600
Message-ID: <cf3341710605251955g5f26fb3fpc20364f55216cdac@mail.gmail.com>

Hmmm...

RAID-5? Large SATA disks? No Cache?

This is already an excellent recipe for incredibly poor performance! But I can probably make it worse:

(*) Make your RAID stripe width really narrow (like maybe 128 or 512 bytes)
and then
make your database blocks large (like 8KB or 16KB) and do lots of multi-block I/O,
so that every I/O is assured to engage every physical drive. (Kiss concurrency goodbye
and watch your RIOPs numbers plummet)

(*) Cause one of your (huge) SATA drives to fail. All I/Os against the
"missing" disk will
not require all kinds of extra work until the diak array can "swap-in" the hot spare. If the
disks are big enough, that could take a *long* time, and it may even consume a substantial
portion of the disk-array's internal bandwidth while doing so.

(*) Place data foolishly on the RAID-5 stripe. Make a logical volume at the
inner edge for
online redologs, and a logical volume at the outer edge for archived redologs. Then commit
frequently, and change logfiles rapidly. Or (much) worse -- make two logical volumes on the
RAID-5 stripe, and then allow your OS to do software RAID-1 with them. Yes! That will
"suck *and* blow at the same time"! (Sorry, I'm quoting Bart Simpson there -- hope that
doesn't lose too much meaning by possibly crossing cultural boundaries.

(*) Mix workloads. Choose your most I/O intensive workloads (e.g.,
"backups" and "batch
processing") and run them at the same time. A personal favorite from *way* back.

(*) Use your RAID array to make lots of SNAPSHOTS (or whatever EMC calls
'em) of your
RAID-5 set, and keep them active during your peak workloads. After all, they're free, right?
Wrong! In fact, with write cache disabled, this would probably add *large* overhead to every
WRITE, but I am only speculating about that.

(*) Fail to correctly implement "Multi-Path I/O" between your server and the
disk array. While
you're at it, use unsupported HBAs, a (single) Fibre-Channel switch that was dropped on a
concrete floor just prior to installation, and for good measure, step on your optical cables a
few times. And maybe even misconfigure the Duplex settings. (I don;t know whether that is
actually possible with Fibre-Channel, but it sure plays havoc with Ethernet!)

(*) Logically subdivide your RAID-5 storage, and assign a big chunk of it to
another system.
Like maybe the corporate e-mail system, or maybe that clandestine Video-on-Demand
server your Windoze sysadmins have been operating out of their cubicles.

I think I could probably come up with a few more bonehead ideas (many of which I have actually
*seen*!) but this is probably enough.

Charlotte, is there any chance that ony of these additional "bonus" features have been sent
sent your way? The first one is particularly likely, as I have seen EMC (or DELL)
documentation that actually *encourages* that particular madness.

Anyway, you may well be gathering your performance statistics in the wrong place. What
kind of stats can you get out of the EMC disk array? Maybe your Storage Administrator
will actually *help* with that. The average I/O size performed by the OS is meaningless
(at least with cache disabled) because these are all "virtual" I/Os. It's the *physical*
I/Os happening inside the disk array that really count!

Oh, by the way, I happen to know the site that Wolfgang referred to. It may amuse you to
know that this particular site (last time I heard from them, anyway) was planning to deploy
about 15TB of *new* storage in their existing CX-700 disk array, using 500GB SATA disks
in a RAID-5 configuration. I think Wolfgang knew of this plan -- perhaps he was just too
polite to point it out. ;-)

Hmmm... This may be a good time to insert a shameless plug. Have a look at Paul Vallee's
article on "BAHD" (Battle Against Huge Disks), which you can find at

http://www.pythian.com/blogs/170/750g-disks-are-bahd-for-dbs-a-call-to-arms

Yes, Paul *is* my boss. (I did say this was a shameless plug!) But that doesn't mean that
I can't respect his opinion at least occasionally. ;-) [[Actually, I respect his opinion ALL the
time. Really. Did you hear that, Paul? I *know* you read this list! ;-) ]]

You won't find anything there that relates to your specific problem (at an engineering level,
at least) but maybe reading it might help you feel better. Or maybe you could print it, bind
it, and use it to beat your PHB. Whatever.

Anyway, best of luck with your problem. By the way, I still respect those Clariion disk arrays
sold by EMC. [[Disclaimer -- I also used to work in pre-sales for Data General, who pioneered
the brand name.]] If configured correctly, you ought to be able to make these things scream.
In a *good* way.

In my mind, that *usually* means:

(*) Use the smallest disks you can convince your tight-wad PHB to order. Or
better, buy
stinking huge (TM) disks, but use only the outermost 1%. Can you say "magnetic drum"?
(*) Use RAID-10.
(*) Get the biggest cache you can afford. (Make sure it's *really*
non-volatile. Wolfgang
never mentioned what happens to your database when you lose the contents of the write
cache, but *I* got a taste! And it wasn't pleasant!)
(*) Dedicate as much of the cache as you can to WRITEs. Oracle is already
really good
at caching READs, but in the absence of special hardware (i.e., non-volatile RAM) is it quite
limited in how it can cache WRITEs.
(*) Don't share. Don't allow your tight-wad PHB to put the corporate e-mail
system on the
same disks you use for your database. (Yeah, right. Like you have a choice!) Heck,
don't let other applications share your *disk array* let alone your spindles. Every I/O they
do is one (or more!) less I/O that you can do.
(*) Don't be afraid to dedicate some spindles to specific uses, but don't
assume that you
need to either. This one can be a tough balancing act, but when in doubt, plan to stick
to the principles of S.A.M.E (stripe and mirror everything) unless you have compelling
reasons to do otherwaise.

But of course, you already know all of this. If you didn't you wouldn't have asked the question,
would you? ;-)

Sorry.

Please let us know how (or whether) you resolve this.

-- 
Cheers,
-- Mark Brinsmead
   Staff DBA,
   The Pythian Group
   http://www.pythian.com/blogs

--
http://www.freelists.org/webpage/oracle-l

Received on Thu May 25 2006 - 21:55:15 CDT