RE: log buffer size and log file syncs

From: CRISLER, JON A <JC1706_at_att.com>
Date: Mon, 30 Apr 2012 14:56:22 +0000
Message-ID: <9F15274DDC89C24387BE933E68BE3FD325280F_at_MISOUT7MSGUSR9D.ITServices.sbc.com>

Interesting thoughts Tanel: in this case of this specific app, the majority of the work is made of up small commits to a handful of tables on a 6 node RAC cluster. I/O times are generally quite good, and with 32 cores per node the CPU and load average is very low. Its 11gR1 - I was wondering if some of the tweaks to put LGWR at "real time" priority that are mentioned for 10g also apply to 11g. From: tanel_at_poderc.com [mailto:tanel_at_poderc.com] On Behalf Of Tanel Poder Sent: Friday, April 27, 2012 6:09 PM
To: CRISLER, JON A
Cc: oracle-l
Subject: Re: log buffer size and log file syncs

Yep, it's possible in theory and practice too - if you have some large (parallel) transactions generating redo on multiple CPUs - which flood the log buffer. This can drive the log file sync times up, as there's sometimes tens or 100+ MB of redo waiting to be flushed before the commit can return.

With lower log buffer size, your large transaction would run slower as it would wait for log buffer space wait much more. Your smaller transactions wait for log buffer space as well, but as there's less redo to be flushed in the "queue" then they'll get their redo in there faster and also get the write done sooner too.

However - I would first look into what the LGWR itself is doing (how much CPU it's consuming vs waiting for IO, what's the average IO size (and redo size per message received by LGWR etc). A common cause for such symptoms is that LGWR just doesn't get enough CPU.

No need to guess, you can run snapper on LGWR too (snapper just queries v$ views so it doesn't matter whether it's a background or foreground session):

Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH


----------------------------------------------------------------------------------------------------------------------------------------

     98, (LGWR)    , STAT, non-idle wait count                                       ,             2,         .4,
     98, (LGWR)    , STAT, messages sent                                             ,             2,         .4,
     98, (LGWR)    , STAT, messages received                                         ,             2,         .4,
     98, (LGWR)    , STAT, background timeouts                                       ,             2,         .4,
     98, (LGWR)    , STAT, physical write total IO requests                          ,             3,         .6,
     98, (LGWR)    , STAT, physical write total multi block requests                 ,             1,         .2,
     98, (LGWR)    , STAT, physical write total bytes                                ,        232448,     46.49k,
     98, (LGWR)    , STAT, cell physical IO interconnect bytes                       ,        232448,     46.49k,
     98, (LGWR)    , STAT, redo wastage                                              ,            28,        5.6,
     98, (LGWR)    , STAT, redo writes                                               ,             1,         .2,
     98, (LGWR)    , STAT, redo blocks written                                       ,           274,       54.8,
     98, (LGWR)    , STAT, redo blocks checksummed by FG (exclusive)                 ,           180,         36,
     98, (LGWR)    , TIME, background elapsed time                                   ,           861,    172.2us,      .0%, |          |
     98, (LGWR)    , WAIT, rdbms ipc message                                         ,       4981797,   996.36ms,    99.6%, |_at_@@@@@@@@@|
     98, (LGWR)    , WAIT, log file parallel write                                   ,           752,    150.4us,      .0%, |          |

-- End of Stats snap 1, end 12-04-28 00:40:59, seconds=5

If you happen to run on Solaris, then you can also run prstat -mLp PID on LGWR's process to see how much time it waits in the CPU runqueue to get onto CPU (the LAT column - scheduling latency).

Another thing to definitely check is to see whether async IO is properly enabled, especially in the case of multiplexed redo logs...

--

Tanel Poder
Enkitec Europe
http://www.enkitec.com/
Advanced Oracle Troubleshooting v2.0 Seminars in May/June 2012! http://blog.tanelpoder.com/seminar/

On Fri, Apr 27, 2012 at 11:36 PM, CRISLER, JON A <JC1706_at_att.com<mailto:JC1706_at_att.com>> wrote: Has anybody run into a case where the log buffer size was too big, and negatively affected log file writes and log file sync waits ?

I have a system that ignores my parameter for 32mb log file size and sets it to 128mb, due to the granule size. The workaround is to lower the granule size so that I can manually set the redo log buffer lower. But I am not sure if this will have any positive affect on log file sync times. Normal I/O for log file writes is fast at 2-4 ms, but log file syncs are bad at 50 to 150 ms.
--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l Received on Mon Apr 30 2012 - 09:56:22 CDT

This message: [ Message body ]
Next message: Jeffrey Beckstrom: "Re: Windows 7"
Previous message: Zelli, Brian: "Windows 7"
In reply to: Tanel Poder: "Re: log buffer size and log file syncs"
Next in thread: saurabh manroy: "Re: log buffer size and log file syncs"
Reply: saurabh manroy: "Re: log buffer size and log file syncs"
Reply: Tanel Poder: "Re: log buffer size and log file syncs"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message