Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: resolving buffer busy waits

Re: resolving buffer busy waits

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 15 Sep 2003 20:23:52 +0100
Message-ID: <bk53ig$bk1$1$8300dec7@news.demon.co.uk>

I am fairly sure you have a filesystem problem, and your BBWs are a side-effect. Your comments about changes in throughput after moving files tends to confirm this, and made me go back to the original trace:

191,000 reads at 9,353 ms/read
plus
44,400 reads at 2,675 ms/read

totals roughly 192,000,000 cs - which is what your v$system_event reports as
waits for disk reads.

You always have to assume that such
things are coincidences, and not trust
them too much - but it's a very suspicious looking coincidence.

(And your BBWs are due to sessions waiting for other sessions to finish reading required blocks - if a block read take 4.5 seconds, BBWs are very likely to query up behind it in a busy system.)

Get on to the Unix S/A and see if they can find errors on controllers or disks.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html

____Finland__September 22nd - 24th
____Norway___September 25th - 26th
____UK_______December (UKOUG conference)

Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____USA__October
____UK___November


The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Casey" <cdyke_at_corp.home.nl> wrote in message
news:8bc6b8d7.0309150722.3e002b14_at_posting.google.com...

> thx for the response Jonathan, comments inline.
>
> >
> > b) Your average read times for these tablespaces
> > appear to be massive - normally I assume that
> > this is a bug where oracle is counting time in an
> > unsuitable way - but maybe you really do have
> > a peculiar I/O problem.
>
> yup, that's what i thought. hard to believe the numbers. these two
> tablespaces are busiest in the system. but others are busy and
their
> average read times are very much normal. so looking at numbers only
> -- the "issue" seems to be entirely focused on these two datafiles.
>
> >
> > c) Your number and times on enqueues is huge -
> > which may be end-user code and UL locks, but
> > might be an issue with distributed transactions.
> >
>
> app is of questionable integrity, but we're stuck w/it!
>
> >
> > In your case, I would tend to assume that the pure
> > I/O load had to be addressed first, as it might fix the
> > BBW as a side-effect. I would also investigate why
> > you enqueues are so expensive because that might
> > be a totally separate problem that also needs to be
> > addressed. I would not, initially, spend much time
> > trying any of the 'hints and tips' fixes for buffer busy
> > waits.
> >
>
> i like that comment very much. am not eager to play w/the bbw
problem
> until i have really identified it _as_ the problem. and that i
> haven't, yet.
>
> what i can add (or expand on my first post) is the following:
>
> - all file systems are UFS
> - this is due to usage of cluster software that was imcompatible
> w/veritas
> - UFS at 2.8 can make use of forcedirectio option, but this caused
> issues early on in the project (last year) and was turned off
> - archive file systems are striped in w/the datafile file systems
>
> very early on we said this combination was a disaster waiting to
> happen(cluster included), but we lost, unfortunately.
>
> now, what i can also add is that this problem has sort of "crept"
up.
> monitoring statspack reports has seen average read times creep up
from
> low double digit millisecond times up to the massive ones now. this
> has occurred rapidly in the past 3 wks and sort of "stablised" at
the
> silliness i see now. however, there were odd spikes early this year
> too. so if it was an underlying issue, it seems fair to assume
these
> numbers should always be "odd". but maybe that's something about
> oracle i have yet to encounter!
>
> so on one hand i have data indicating it appears to be some sort of
> rapid creep associated with -- perhaps -- load. but on the other,
it
> looks like the potential for "odd numbers" has always been there.
>
> and here's something to either laugh at or simply ponder: we have
two
> datafile file systems - one had very recently jumped to 92% capacity
> after a normal growth extension. at a stretch, i decided to
relocate
> a file, taking that file system down to 87%. these odd numbers w/in
> oracle did drop, but are still whacky -- however, we nearly doubled
> the number of checkpoints completed in an avg 24 hour period that
> evening after the outage. that throughput increase manifested
itself
> in much higher application throughput and has been sustained since.
> IO problem you say?
>
> in summation: there are a lot of oddities here. but i have to tread
> carefully.
>
> thx again for you comments Jonathan. very interested to see if the
> extra info provided triggers more comments.
>
> ah - nuno - no NFS ... that i can say w/certainty!
>
> cheers,
>
> casey ...
Received on Mon Sep 15 2003 - 14:23:52 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US