Re: I/O Performance/bottlenecks on EMC Symmetrix

From: Don Granaman <granaman_at_home.com>
Date: Fri, 14 Sep 2001 09:14:58 -0700
Message-ID: <F001.0038ECD6.20010914091022@fatcity.com>

!! Please do not post Off Topic to this List !!

It seems that the I/O wait statistics, along with the increased time it takes for the job to run, is pretty good circumstantial evidence (enough for the "grand jury" hearing). That should be sufficient motivation to start the ball rolling to collect more detailed information about the physical layout and what is going on inside the Sym.

They (management, SA, whomever) can't really expect a definitive analysis (the whole trial argument) and a proposed solution in detail (sentencing recommendation???) since you don't have the level of access necessary to get that information.

If you can get very specific wait information and identify the biggest bottlenecks - which datafiles, redo logs, etc. are the worse offenders - it might help build a stronger case. I hesitate a bit on this recommendation because overly specific information of that nature at this time might lead to only a partial solution - a "fix" to only the most severe immediate problems - rather than to do a more comprehensive review of the physical layout in the Symmetrix.

Also, a comparison of total time waited on these I/O events and job run time between the test system and the EMC system should help. You might be able to see a direct correlation between the I/O waits and the run time. The company paid a premium for EMC storage and should be getting more out of it, not less. My experience has been that EMC is actually pretty good about helping out with gathering statistics, etc. - if you can get them in. (Your mileage may vary.)

If it is any help... (and anecdotal evidence rarely is) less than a year ago, I took an "EMC approved, big black box" layout, performed a thorough I/O analysis on the database, and rebuilt the disks in the Sym according to more conventional I/O practices - striping, dedicating redo log disks, distributing contending objects between disks/stripe_sets, etc. The "after" configuration throughput was eight times greater than the "before" layout - on identical hardware. And this was with all the disks in question (not the entire Sym though) already being dedicated entirely to a single database. This wasn't an isolated case, just the most dramatic of several. I'm sure that others, especially Gaja, have such stories also.

Again, I would seriously suggest reading his white paper at http://www.quest.com/whitepapers/Raid1.pdf . It has a wealth of information on this very topic and, I believe, it has some specific examples of problem layouts,solutions, and gains. (At least the presentation did.)

Incidentally, it isn't absolutely necessary to dedicate entire disks to a single database. It is usually preferred - in my opinion, but if you don't dedicate disks, you should treat the I/O tuning exercise as if everything using any particular set of disks is a single database (even if some of it isn't database!). For example, you have DB01 and DB02 sharing a set of disks. Distribute the I/O between disks as if DB01 + DB02 are a single database. Completely independent I/O tuning for DB01 and DB02 probably won't cut it.

Often, you have to win the motivational/political battle before you can even really begin the technical battle. It sounds like that is probably where you are now. So, I'll refrain from pollutimg the list with more long generic discussion on this topic. Good luck!

-Don Granaman
[OraSaurus - Honk if you remember UFI!]

Original Message ----- To: "Multiple recipients of list ORACLE-L" <ORACLE-L_at_fatcity.com> Sent: Friday, September 14, 2001 4:00 AM

!! Please do not post Off Topic to this List !!

Hi Don,

wait_events that are dominant.
db_file_scattered_reads, db_file_sequential_reads, db_file_parallel_write,sort_segment_request are the dominant wait (right under SQL*Net message from client & rdbms ipc message)
So yes I know I have an I/O problem. It's just that I'm stuck with an EMC storage solution that I can not look into. I need "evidence" to go to SA/management and say that disk layout is no good or something along that line.

I do know for a fact that each individual disk is 36Gb and sliced in (i believe) 4,5Gb slices. You can therefore be sharing disks with other I/O intensive apps.

As for monitoring tools, I'm the lowly DBA that has no business on UNIX so I need the UNIX people to do this for me. They will help as they are always cooperative, but also very busy, so I need to show them some facts and figures.

TIA Jack

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Don Granaman
  INET: granaman_at_home.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Fri Sep 14 2001 - 11:14:58 CDT