Re: intermittent long "log file sync" waits

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Sun, 2 Feb 2020 09:39:52 +0000
Message-ID: <LNXP265MB1562810EC75BB7A638891C97A5010_at_LNXP265MB1562.GBRP265.PROD.OUTLOOK.COM>



It be interesting to see the previous 5 or 6 seconds of lgwr WAITS - using the "tim=" values as indicators to get to a point before that log file sync started. 
The foreground doesn't have to send a message to the log writer if the log writer is already writing because it knows that LGWR will check if there's more to write before going to sleep. Possibly there's a race condition where the foreground decides that it doesn't need to send a message but the log writer has decided there's no more to write - the log writer's activity in the few seconds before it's 2.9 second wait might give us a clue.

I'm also a little puzzled that a foreground could wait 6.9 seconds.  I thought there was a timeout for LFS (either 1 second or 1/10th second - possibly version dependent, and settable by hidden parameter). So, unless the foreground times out but doesn't start a new wait, it can't wait for 6.9 seconds.

Purely for the sake of collecting information I think I'd take a look at smon_scn_time to to see what time the foregrounds  "sync scn" corresponded to, and to check the SCNs for the start and end time of the LFS wait.  



Regards
Jonathan Lewis
 


________________________________________
From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of Tanel Poder <tanel_at_tanelpoder.com>
Sent: 02 February 2020 03:50
To: Tanel Poder
Cc: Chris Stephens; Noveljic Nenad; oracle-l
Subject: Re: intermittent long "log file sync" waits

But your trace file here already indicates that there's some glitch regarding LGWR wakeups/posting:

Foreground:

*** 2020-01-29T13:04:20.724666-06:00
WAIT #0: nam='log file sync' ela= 6909711 buffer#=58996 sync scn=976089279 p3=0 obj#=-1 tim=1025228531958

LGWR at the same time:

*** 2020-01-29T13:04:20.725994-06:00
WAIT #0: nam='rdbms ipc message' ela= 2918443 timeout=300 p2=0 p3=0 obj#=-1 tim=1025228533313

The WAIT lines are printed out when the wait ends (and the *** timestamp line just before the WAIT line is printed out, if enough time has passed since the last *** line). So this shows that LGWR had happily been idle, waiting for its idle 3-second timeout to expire, or someone to wake it up (that's what "rdbms ipc message" means) - while someone had issued a commit and was waiting for LGWR acknowledgment. So this shows that LGWR didn't even get the post from your session, nor any other sessions (if you had concurrent transactional activity). Back in 11g, typically these problems happened when adaptive log file syncing switch between poll and post/wait wakeup method happened and you hit a bug. But in 12c there are more moving parts.

--
Tanel Poder
https://blog.tanelpoder.com/seminar

--
http://www.freelists.org/webpage/oracle-l Received on Sun Feb 02 2020 - 10:39:52 CET

Original text of this message