Re: Strange IO problem on T3

From: jfixsen <jfixsen_at_virtumundo.com>
Date: 23 Jan 2004 05:20:01 -0800
Message-ID: <26bc4d13.0401230520.20dfd89e@posting.google.com>

Brian, thanks, but that didn't help. That was the first thing I tried. Turning async off and going with the slaves made it even slower, though it didn't produce any aiowait messages. So, I turned it back on. I actually haven't seen any aiowait messages since setting the set TS:ts_sleep_promote=1 in /etc/system, but performance is still the same/bad. I have to think the aiowait is just a symptom of a bigger problem.

Brian Peasland <dba_at_remove_spam.peasland.com> wrote in message news:<40103A8D.15BF15C6_at_remove_spam.peasland.com>...
> Check out Note 222989.1 on Metalink!
>
> Cheers,
> Brian
>
>
>
> jfixsen wrote:
> >
> > Hello!
> >
> > Oracle 9.2.0.4
> > SunOS pchi-db01 5.8 Generic_108528-19 sun4u sparc
> > SUNW,Ultra-EnterpriseSystem = SunOS
> > Node = pchi-db01
> > Release = 5.8
> > KernelID = Generic_108528-19
> > Machine = sun4u
> > BusType = <unknown>
> > Serial = <unknown>
> > Users = <unknown>
> > OEM# = 0
> > Origin# = 1
> > NumCPU = 8
> >
> > History: we had been using a SAN disk array for storage and then
> > switched over to a Sun T3. About a week after moving to the T3, I saw
> > the following message in my alert log: WARNING: aiowait timed out 1
> > times. No performance problems happened until about a month after
> > that.
> >
> > Problem: We started seeing huge performance problems from out of
> > nowhere on December 16 on everything from big batch jobs (heavy FTS)
> > to simply logging in, and I started seeing several of these aiowait
> > messages each day, sometimes up to 20. No application changes were
> > made at any time during any of this, and after the performance
> > problems started, I even cut the load, mostly big FTS jobs, way back.
> >
> > I had been running statspack everyday the entire time, and on the day
> > the performance problems hit hard (about 5 weeks after going to the
> > T3), the noticeable difference I saw in the statspack reports were all
> > related to writes from what I could tell.
> >
> > BEFORE:
> >
> > Top 5 Timed Events
> > ~~~~~~~~~~~~~~~~~~
> > % Total
> > Event Waits Time (s)
> > Ela Time
> > -------------------------------------------- ------------ -----------
> > --------
> > CPU time 43,689,174
> > 92.51
> > db file scattered read 131,668,468 949,948
> > 2.01
> > PX Deq: Execute Reply 931,750 496,692
> > 1.05
> > direct path read 73,177,620 489,356
> > 1.04
> > PX Deq Credit: send blkd 24,148,414 425,685
> > .90
> >
> > AFTER:
> >
> > Top 5 Timed Events
> > ~~~~~~~~~~~~~~~~~~
> > % Total
> > Event Waits Time (s)
> > Ela Time
> > -------------------------------------------- ------------ -----------
> > --------
> > log file sync 874,418 604,164
> > 32.92
> > direct path write 121,724 233,840
> > 12.74
> > PX Deq Credit: send blkd 1,103,485 212,285
> > 11.57
> > db file scattered read 12,039,568 165,860
> > 9.04
> > log buffer space 158,742 127,009
> > 6.92
> >
> > BEFORE:
> >
> > Avg
> > Total Wait wait
> > Waits
> > Event Waits Timeouts Time (s) (ms)
> > /txn
> > ---------------------------- ------------ ---------- ---------- ------
> > --------
> > db file scattered read 131,668,468 0 949,948 7
> > 60.4
> > PX Deq: Execute Reply 931,750 211,623 496,692 533
> > 0.4
> > direct path read 73,177,620 0 489,356 7
> > 33.5
> > PX Deq Credit: send blkd 24,148,414 122,149 425,685 18
> > 11.1
> > db file sequential read 31,794,392 0 349,188 11
> > 14.6
> > direct path write 2,652,105 0 309,880 117
> > 1.2
> > log file sync 3,021,375 104,267 201,582 67
> > 1.4
> > db file parallel write 547,546 254,564 68,136 124
> > 0.3
> > enqueue 27,670 14,655 50,246 1816
> > 0.0
> > log buffer space 110,172 32,943 47,180 428
> > 0.1
> >
> > AFTER:
> >
> > Avg Total Wait
> > wait Waits
> > Event Waits Timeouts Time (s) (ms)
> > /txn
> > ---------------------------- ------------ ---------- ---------- ------
> > --------
> > log file sync 874,418 534,000 604,164 691
> > 3.4
> > direct path write 121,724 0 233,840 1921
> > 0.5
> > PX Deq Credit: send blkd 1,103,485 94,421 212,285 192
> > 4.3
> > db file scattered read 12,039,568 0 165,860 14
> > 46.5
> > log buffer space 158,742 120,911 127,009 800
> > 0.6
> > PX Deq: Execute Reply 742,346 61,708 126,984 171
> > 2.9
> >
> > I am waiting on a response from Sun, but I want to be sure it's not
> > something I'm overlooking. We've got all the latest sun patches, and
> > one post I saw
> > that had to do with the aiowait message said to change /etc/system to
> > include the following parm after installing the patch (which we
> > already had):
> >
> > * this parm is associated with the aiowait errors that was corrected
> > in patch 112255-01 solaris v8
> > set TS:ts_sleep_promote=1
> >
> > I haven't seen the aiowait message in the alert log since the parm had
> > been added and the db restarted, but performance still sucks and
> > statspack numbers are still the same/bad. One thing consistently
> > noticable is when I try logging into sqlplus, it hangs from 10-60
> > seconds. If I look at the wait events from another session, it always
> > waits on log buffer space initially, and then log file sync. Or if I
> > try dropping a small simple table, it gives me the enqueue wait.
> >
> > To me, this all smells like a SUN IO problem since one day it "just
> > started happening", but I would love everyone's opinions.
> > Thanks
> >
> > Jason
> > jfixsen_at_nospam_virtumundo.com
>
> --
> ===================================================================
>
> Brian Peasland
> dba_at_remove_spam.peasland.com
>
> Remove the "remove_spam." from the email address to email me.
>
>
> "I can give it to you cheap, quick, and good. Now pick two out of
> the three"
Received on Fri Jan 23 2004 - 07:20:01 CST