RE: High wio on New Hitachi SAN Storage of DB Server

From: VIVEK_SHARMA <VIVEK_SHARMA_at_infosys.com>
Date: Sat, 12 Nov 2005 10:59:18 +0530
Message-ID: <BBD944BCAC3AB4499DFBAFB1D8AF3020290CA2@BLRKECMSG11.ad.infosys.com>

Prashant, Folks

>From Statspack (as asked)

Buffer Gets Executions Gets per Exec % Total Hash Value --------------- ------------ -------------- ------- ------------

     56,896,369            1   56,896,369.0    22.8   1466052601
select ocp.set_num setid, gam.schm_code schm,        gam.foracid
 acct, gam.acct_name acct_name,        oci.instrmnt_id Instr_No,
        ocp.tran_amt inst_amt,        mpt.mp_topic topic from mp

t, gam, ocp, oci where ocp.sol_id = '212' and ocp.clg_zone_dat e = '15-10-2005' and ocp.clg_zone_code = 'OW-MICR' and ocp.c

     56,874,454            1   56,874,454.0    22.8   1143844609
select ocp.set_num setid, gam.schm_code schm,        gam.foracid
 acct, gam.acct_name acct_name,        oci.instrmnt_id Instr_No,
        ocp.tran_amt inst_amt,        mpt.mp_topic topic from mp

t, gam, ocp, oci where ocp.sol_id = '212' and ocp.clg_zone_dat e = '15.10.2005' and ocp.clg_zone_code = 'OW-MICR' and ocp.c

28,324,295 38,167 742.1 11.4 2332700722 SELECT TABLE_ABBR, TABLE_KEY, TABLE_FUNC, CONTEXT_SOLID from SO D where ROWNUM <= 250

7,442,129 1,019,697 7.3 3.0 696473406 SELECT table_func FROM SOD WHERE TABLE_ABBR = :1 and TABLE _KEY = :2 and CONTEXT_SOLID = :3 FOR UPDATE NOWAIT

Another basic Qs:-

If init.ora parameter disk_async_io is set to FALSE is usage of the AIO Servers (of AIX) automatically diabled?

Will provide any info needed

Thanks indeed

-----Original Message-----
From: Mark Brinsmead [mailto:mark.brinsmead_at_shaw.ca] Sent: Saturday, November 12, 2005 6:25 AM To: ade.turner_at_gmail.com
Cc: VIVEK_SHARMA; 'Oracle-L'
Subject: Re: High wio on New Hitachi SAN Storage of DB Server

Vivek and Adrian,

I'm afraid I can't offer any "answers", but I do have a "dumb" question that
might be helpful. (Or not.)

Are you using Async-I/O, or Concurrent I/O? How many AIO servers have
you configured?

If you are truly using Async-I/O against "cooked" JFS2 storage, there

are a
couple fairly obvious options available to you.

One would be to try Async I/O with raw partitions instead. As I recall,
AIX 5.2 (and later) uses a kernel-based AIO driver, as opposed to AIO servers. If you see a pronouced performance difference by moving to raw partitions (which aren't *too* awful with the AIX volume manager) this would probably suggest that you have misconfigured the AIO servers, either by not permitting enough of them or perhaps by not allowing them to queue enough I/Os per (logical) disk...

Another option, at least for adrian, is to mount your filesystems with the
"cio" option, and enable Concurrent I/O in the database, you will be able
to bypass the AIO servers completely.

I'm afraid I'm out of the office right now, so I can't easily offer more
details. There are several whitepapers on CIO for Oracle (or databases in general) on IBM's website. You'll also find a considerable amount of discussion (now, maybe a year or two old) about AIX and CIO on MetaLink.

At a (very) wild guess, it would sound like maybe you (or your sysadmins)
have substantially under-configure one or both of AIO "minservers" and "maxservers" (I think those are the terms used by SMIT), and/or maybe need to "lie" to AIX about the number of I/Os that can be queued for each of your logical disks. (I've never bothered to tweek the latter; I've never had to.)

For the record, my configuration is P610s going against an IBM ESS disk array (yeah, I know, small potatoes) and at least indirectly P570s going against IBM ESS disk arrays, and EMC CX-700 disk arrays (medium-sized potatoes). I've never seen indications of your kind of problems with either.

One other silly question: Is there any chance that somebody has "goofed"
with the RAID configuration? 70% WIO is *extremely* high -- high enough to maybe suggest the possibility that somebody has configured hardware or software RAID-1 between two "logical" disks that are actually located on the same "physical" disk. I *know* this one is a long shot, but I *have*
crossed paths with people who have actually done this with software RAID at least. Does 'sar' suggest consistently poor performance for all devices?

Adrian wrote:

>Hi Vivek,
>
>Your note interested me; we have exactly the same configuration and
symptoms
>but we are still running async_io = TRUE. I concur with your findings
on the
>extreme performance issues - short term server freezes are occurring.
You
>mention a bug; is this hearsay or do you have details?
>
>If you've turned of async_io (the default on aix), then you may want to
use
>DBWR_IO_SLAVES. I've not tried this yet.
>
>Anyway, ours is a HDS9990 with McData Switches attached to a p690
running
>AIX 5.2 ML4. The filesystems are jfs2.
>
>I can reproduce the problem just by creating a 10GB datafile, multiple
users
>doing random i/o can also get the same issue, or indeed when a parallel
rman
>backup is in progress. However concurrent cp's of multiple files do not
>reproduce the issue (hence I believe it is likely an async i/o problem.
>
>Under load Topas shows high wait for i/o and Sar -q shows that swpocc%
= 100
>and swpocc > 20
>
>My unix admins are currently looking at the async i/o settings as per
>metalink note 271444.1, but are heavily loaded and this is not prod
(yet) so
>the urgency is low.
>
>If you or anyone else has any pointers with this configuration please
let me
>know.
>
>Kind Regards
>Adrian
>
>-----Original Message-----
>From: oracle-l-bounce_at_freelists.org

[mailto:oracle-l-bounce_at_freelists.org]
>On Behalf Of VIVEK_SHARMA
>Sent: 11 November 2005 18:56
>To: Oracle-L
>Subject: High wio on New Hitachi SAN Storage of DB Server
>
>
>Folks,
>
>While doing Application Testing of Hybrid Trans (OLTP mostly though) by
>200 users (approx) on a NEWLY configured HITACHI SAN Storage, on DB
>Server (of AIX) High wait for IO i.e. wio = 70 % till 1400 Hours is
>observed.
>
>NOTE - wio reduced to about 10 % gradually from 1400 Hours to 2000
>Hours.
>
>Average sar Output from morning to 1400 Hours:-
>
>13:38:00 %usr %sys %wio %idle
>13:43:00 6 5 67 22
>13:48:00 10 6 74 10
>13:53:00 10 5 66 19
>13:58:00 7 5 61 27
>14:03:00 5 5 67 22
>14:08:00 7 5 74 15
>14:13:00 9 6 69 15
>
>CONFIG
>======
>DB Server = 7 CPUs
>OS AIX 5.2
>Oracle 8.1.7
>Hitachi Storage for DB Server = 40 underlying Disks
>Hardware Raid
>Init.ora disk_asynch_io = FALSE (When disk_asynch_io is set TRUE there
>is extreme performance degradation with AIX 5.2 [seems a bug])
>
>
>Comments by IBM
>===============
>The average number of process waiting for IO to complete is 12. This
>indicates that these processes are waiting for the IO to complete. This
>is the reason why we are seeing an average iowait of 70%.
>
>The seek rate is 95.72% on the hdsdb9960lv LV's indicates a high
degree
>of random IO, usually caused by the application or a high degree of
disk
>fragmentation.
>
>
>STATSPACK report (will provide any other sections as needed)
>================
>
>DB Name DB Id Instance Inst Num Release OPS Host
>------------ ----------- ------------ -------- ----------- ---
>------------
>ABNPROD 34298189 abnprod 1 8.1.7.4.0 NO findb
>
> Snap Id Snap Time Sessions
> ------- ------------------ --------
> Begin Snap: 5 15-Oct-05 13:00:55 352
> End Snap: 6 15-Oct-05 14:00:37 352
> Elapsed: 59.70 (mins)
>
>Cache Sizes
>~~~~~~~~~~~
> db_block_buffers: 215000 log_buffer: 18874368
> db_block_size: 8192 shared_pool_size: 754288000
>
>Load Profile
>~~~~~~~~~~~~ Per Second Per
Transaction
> ---------------

> Redo size: 298,013.21
837.31
> Logical reads: 55,540.47
156.05
> Block changes: 2,296.74
6.45
> Physical reads: 3,109.99
8.74
> Physical writes: 399.33
1.12
> User calls: 2,657.16
7.47
> Parses: 64.98
0.18
> Hard parses: 5.44
0.02
> Sorts: 75.56
0.21
> Logons: 0.75
0.00
> Executes: 1,783.45
5.01
> Transactions: 355.92
>
> % Blocks changed per Read: 4.14 Recursive Call %: 15.67
> Rollback per transaction %: 94.71 Rows per Sort: 10.87
>
>
>Top 5 Wait Events
>~~~~~~~~~~~~~~~~~ Wait
%
>Total
>Event Waits Time (cs)
>Wt Time
>-------------------------------------------- ------------ ------------
>-------
>db file sequential read 6,230,712 1,640,351
>43.84
>log file sync 1,087,475 1,286,467
>34.38
>db file scattered read 351,411 416,508
>11.13
>log file parallel write 706,201 288,168
>7.70
>buffer busy waits 334,943 69,830
>1.87
> -------------------------------------------------------------
>
>
>Qs How might this issue be approached?
>Qs Are there any special O.S. parameters that might be set?
>

CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS*** -- http://www.freelists.org/webpage/oracle-l

Received on Fri Nov 11 2005 - 23:34:19 CST