Re: Tuning 'log file sync'

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 5 Sep 2005 07:34:47 +0000 (UTC)
Message-ID: <dfgsem$el9$1@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com>

"Vsevolod Afanassiev" <vafanassiev_at_yahoo.com> wrote in message news:1125615265.135322.10050_at_f14g2000cwb.googlegroups.com...
> One of our database experiences heavy waits on 'log file sync'.
> Top waits from V$SYSTEM_EVENT are given below. 'Log file sync'
> exceeds 'db file sequential read' and 'db file scattered read'.
> Metalink Note 223117.1 says that they key to understanding
> 'log file sync' is to compare average times waited for 'log file sync'
> and 'log file parallel write': if they are similar then
> it is I/O problem, but if they are different then "the delay is
> caused by the other parts of Redo Logging mechanism
> that occur during a COMMIT/ROLLBACK (and are not I/O-related)".
> How to find out what are these "other parts"?
>
> This is 9.2.0.5.0 on Tru64.
>
> Thanks
>
>
> Time Average
> Waited Wait
> Event (sec) (sec)
> ------------------------ ------- ------
> SQL*Net message from c 9836478 .19
> rdbms ipc message 798023 .30
> PX Idle Wait 763886 2.15
> enqueue 387408 .71
> log file sync 237811 .21
> pmon timer 190927 3.11
> smon timer 186319 5.18
> db file sequential read 85030 0
> db file parallel write 57577 .20
> local write wait 41371 .42
> rdbms ipc reply 32636 .33
> log file parallel write 27843 .03
> latch free 8538 .01
>

You do seem to have some unusual
activity there. Most surprising is the
smon timer, at an average of five seconds when the common smon timeout is every
five minutes.

Second on the list is the very low value for the average wait for SQL*Net message from client: Connor's observation that you are doing too many commits may be the basic problem - do you have several 'batch' processes that are running round and round in circles doing tiny bits of work in each cycle and then committing.

Your write times (db file parallel at 200 ms and log file parallel at 30 ms) are slow, which suggests you have an I/O problem. Again this may be a side effect of your coding strategy, exaggerated by an aggressive MTTR.

Log file sync is often a consequence of
log writer having trouble - but in this case, as you point out, the log file sync average is much higher than the log writer average, so it is POSSIBLE that there is another cause.

As others have pointed out, increasing the log buffer size can increase the average length of log file syncs - on the other hand, decreasing the log buffer can leave you
suffering from log buffer space waits - but the figures do not suggest that your 2M log buffer is too large.

You don't have any significant time waiting on 'LGWR waiting for redo', which can occur in a highly concurrent system as LGWR waits for other sessions to fill the log buffer that they have allocated but not yet copied into.

So the thought I would pursue is that the log writer is having trouble notifying the correct sessions that the log buffer has been written. Are you seeing many timeouts on log file sync ? How busy is the CPU ? How many processes Oracle processes are running on the machine ? How much non-Oracle CPU is being used ?

As a side-note, the average wait for enqueues is large. Take a look at v$enqueue_stats to find out which types of enqueue are suffering most waits and losing most time - this may be relevant. I have seen three cases of unusual log file sync times so far in 9.2 - one of them related to frequent truncation of global temporary tables, and the enqueue times may be giving us a hint that this is your problem.

-- 
Regards

Jonathan Lewis

Now waiting on the publishers:    Cost Based Oracle - Volume 1

http://www.jlcomp.demon.co.uk/faq/ind_faq.html
The Co-operative Oracle Users' FAQ

http://www.jlcomp.demon.co.uk/appearances.html
Public Appearances - schedule updated 8th July 2005

Received on Mon Sep 05 2005 - 02:34:47 CDT