Re: intermittent very high waits in LGWR on Linux?

From: DA Morgan <damorgan_at_psoug.org>
Date: Thu, 30 Jun 2005 06:54:43 -0700
Message-ID: <1120226088.751031@yasure>

bugbear wrote:
> Noons wrote:
>

>> bugbear apparently said,on my timestamp of 1/07/2005 9:14 PM:
>>
>>> But the spread...
>>> 89% are under 29177 microseconds AKA 29 milliseconds
>>> with a "reasonable" spread.
>>>
>>> But the remaining 11% are over 986468 microseconds,
>>> which is extraordinarily close to 1 second.
>>>
>>> Indeed, there are only 3 times (out of 4922)
>>> above 29177 but below 986468.
>>>
>>> It seems that I either get "correct" redo
>>> log write out, with times varying from 53 to 29177
>>> microseconds, or I "fallback" to some kind of quantized
>>> timeout write behavior, driven by a 1 second clock.
>>>
>>> This is gettin' weird.
>>
>>
>>
>> Not really.  I think it has to do with the fs cache
>> flush and the remaining parameters in your spfile.
>>
>> If I were you I'd take the timings of the approx 80% and ignore
>> the others.  Once you decide on a spread for running a real
>> live test with a more realistic config, you'll be able to
>> get rid of those last 20% with a complete setup geared for
>> Oracle 9i.
>>
>> One of the development targets for 10g was precisely to make
>> it perform significantly better on default setup,
>> resource-limited systems.  This, so that first time users would
>> get a "better" impression of the product.
>>
>> Previous releases (9i included) were notorious for default
>> setups that were nothing short of moronic.  This situation got
>> aggravated with the SPFILE as it is now binary data and therefore
>> not obvious what is inside it.  Hence the CREATE PFILE FROM
>> SPFILE incantation Holger referred to: it's the most expedient
>> way to dump ALL parameters set to anything other than
>> default.
>>
>> Or you can try to SELECT the NAME and VALUE columns from the
>> view V$PARAMETER.  You wouldn't believe some of the dumb values
>> 9i defaults to!  It could also well be in archivelog mode,
>> which will slow you down periodically on a single disk system.
>>
>> You'll be able to get similar performance to 10g, it just
>> needs a bit more attention to detail.  Which is probably
>> hidden at this stage behind the "everything in the same
>> f/s, default install" syndrome.
>>
>> It's a common occurrence.  Hence my recommendation you take
>> your timings from the 80% as the typical results on a
>> properly setup system.  The purpose of making your redo logs
>> larger was precisely to try and highlight bottlenecks on
>> switching redos: one of the most common performance traps
>> before 10g.
>>
>> Bottom line: take the bad 20% or so with a very large grain of
>> salt and extrapolate based on the 80%.  9i can indeed be
>> tuned for more even performance but you probably do not want
>> to do that at this stage: not worth it.
>>

>
> <<other good stuff read, digested and snipped>>
>
> I think I'm up against a bug. I finally took a step back,
> stopped looking at Oracle, and looked at the machine.
>
> This is not (quite) as odd as it sounds, since the machine
> is over on a rack, quote a way from me.
>
> A quick RPM later gaves me the Linux version of iostat.
>
> Running iostat -k1 whilst running my slow test gives (sample snapshot)
>
> avg-cpu: %user %nice %sys %iowait %idle
> 0.00 0.00 0.00 0.00 100.00
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> dev3-0 3.00 4.00 8.00 4 8
>
> avg-cpu: %user %nice %sys %iowait %idle
> 1.00 0.00 0.00 0.00 99.00
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> dev3-0 12.00 4.00 112.00 4 112
>
> avg-cpu: %user %nice %sys %iowait %idle
> 1.00 0.00 2.00 0.00 97.00
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> dev3-0 7.00 8.00 28.00 8 28
>
> avg-cpu: %user %nice %sys %iowait %idle
> 2.00 0.00 1.00 0.00 97.00
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> dev3-0 13.00 16.00 64.00 16 64
>
> avg-cpu: %user %nice %sys %iowait %idle
> 0.00 0.00 1.00 0.00 99.00
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> dev3-0 7.00 4.00 36.00 4 36
>
> Oracle is making heavy use of neither CPU nor IO!!!!
> (and neither is anything else...)
> It appears that the "log file sync waits" I'm seeing are more
> like sleeps(). It ain't even tryin'.
>
> Since the LGWR is a separate process, I start to (again) suspect Linux
> scheduling.
>
> BugBear

I'm seeing something very similar on a system running 10.1.0.3 during a CREATE TABLE AS .... Increase the size of log buffers (possibly a lot). How many log files? What size? How many groups? members?

This advice is given by ADDM if you use the grid control: And it works.

-- 
Daniel A. Morgan
http://www.psoug.org
damorgan_at_x.washington.edu
(replace x with u to respond)

Received on Thu Jun 30 2005 - 08:54:43 CDT