Re: intermittent long "log file sync" waits

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Mon, 10 Feb 2020 16:16:13 -0600
Message-ID: <CAEFL0sx0Hy_dG103TFuqA0yqg+azEAQqnZLv-W53uxM+FKi+rA_at_mail.gmail.com>



I was able to capture x$trace output from session w/ 10005:4 for a session experiencing > 1 sec LFS wait.

$ awk '$0 ~ /^*** SESSION ID/ || ($7 > 1000000 && $0 ~ /log file sync/) {print $0}' lsst2db3_ora_1100.trc
*** SESSION ID:(8.36700) 2020-02-10T14:26:43.346885-06:00 WAIT #0: nam='log file sync' ela= 1165317 buffer#=18314 sync scn=1014204133 p3=0 obj#=-1 tim=534257573876

however i don't see rows with "data like '%log file%':

SQL> select count(*) from xt_snaps where sid = 8 and serial# = 36700;

   COUNT(*)


         88

SQL> select count(*) from xt_snaps where sid = 8 and serial# = 36700 and data like '%log file%';

   COUNT(*)


          0

Here is a gist of all data corresponding to that sid,serial# (i didn't see anything interesting):

https://gist.github.com/cs2018ncsa/b2a4e235dc68f6fe240461f4d2035e96

I ran the following (generated from a script executing every 5 seconds looking for new sessions running this workload) to enable that 10005:4 event so i think i got things right:

alter tracing enable "10005:4:8";

Oracle support recommended we set the following and test:

Here are additional steps I would like you to try.

alter system set "_high_priority_processes"='LMS*|LGWR|PMON' scope=spfile;ex ALTER SYSTEM SET "_use_single_log_writer"=TRUE scope=spfile; <<-- restart required.
ALTER SYSTEM SET "_use_adaptive_log_file_sync"= false;

we did that and still experienced the long LFS.

not sure where to go from here except to the fridge to grab a beer.

On Fri, Feb 7, 2020 at 9:41 AM Priit Piipuu <priit.piipuu_at_gmail.com> wrote:

> Hi,
>
> 19c RAC and "log file sync" without associated dbwr activity rings a bell.
> I noticed similar thing while investigating DRM related problems. In our
> case GRD gets locked for a few seconds during remastering, and that totally
> wrecks certain DML heavy workloads. But it looked like a collateral damage
> and I didn't bother to investigate further.
> Of course, with remastering going wrong there would be "gcs drm freeze in
> server mode" wait events, and probably some other gcs related stuff from
> "Other" wait class. That's not apparent from the traces below.
>
>
> On Tue, 28 Jan 2020 at 17:11, Chris Stephens <cstephens16_at_gmail.com>
> wrote:
>
>> 3-node Oracle 19.3 RAC
>> Centos 7
>>
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Feb 10 2020 - 23:16:13 CET

Original text of this message