Re: intermittent long "log file sync" waits

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Wed, 5 Feb 2020 15:18:53 -0600
Message-ID: <CAEFL0szvhf1rAKhak_oyyUERtzvQLpN9++S75DDsd+ViV5sFgA_at_mail.gmail.com>



I had Monday off. Tuesday we rolled back the /tmp ramdisk changes but we were still unable to reproduce the issue. Today we ran the workload against another 3-node RAC cluster w/ identical hardware. It's my understanding that the only thing they share is the switch between nodes and storage arrays. ASM in each cluster writes to 2 separate storage arrays. This morning we had 1 session experience the issue between 11:09:32 and 11:11:23 w/ several ~3 second waits on LFS. I don't think they are *always* 3 seconds but that's interesting nonetheless.

mpstat shows no CPU issues during problem interval. iostat shows slowness on local disk only (/boot,/tmp,/) but that's not specific to problem interval.

Frits' log writer stats account for barely any time since instance start at 9:30 this morning.

SQL> _at_lgwr_stats
The different stages are:
1/ gather time: includes making sure the redo in the public strand is done writing, SCN adjustment and redo preparation. 2/ schedule time: includes gather time and all the write preparation until but excluding the actual write
3/ finish time: includes previous times but now includes the actual write calls
4/ total time: includes previous times but now includes posting the foregrounds

                       NAME    time (us)
___________________________ ____________
redo write gather time                81
redo write schedule time               5
redo write finish time               324
redo write total time                 10

ASH data show same behavior as before.

No log switches on any instance at that time.

SQL> select inst_id,to_char(first_time,'HH24:MI:SS') from gv$log_history where first_time > trunc(sysdate) order by first_time asc;

   INST_ID TO_CHAR(FIRST_TIME,'HH24:MI:SS')

__________ ___________________________________
         2 08:22:32
         1 08:22:32
         2 08:22:32
         3 08:22:32
         1 08:22:32
         3 08:22:32
         1 11:53:06
         2 11:53:06
         3 11:53:06

nothing in database alert log.

nothing in lgwr trace file showing any switching lgwr behavior.

there are some weird (constance..not specific to this time interval) entries in asm lmbh trace file like:

  • 2020-02-05T15:07:22.198171-06:00 kjgcr_DoAction: KJGCR_ACTION - id:35 gcr disk io failure ran with status:0 clr:1
  • 2020-02-05T15:07:24.301581-06:00 kjgcr_DoPreAction: KJGCR_PREACTION? gcr disk io failure - id:35
  • 2020-02-05T15:07:24.301660-06:00 kjgcr_DoAction: KJGCR_ACTION? gcr disk io failure - id:35
  • 2020-02-05T15:07:24.301688-06:00 kjgcr_DoAction: KJGCR_ACTION - id:35 gcr disk io failure ran with status:0 clr:1

I have an SR open for this.

no concurrent database activity.

look at raw 10046 trace data, 100% of the LFS waits are associated w/ cursor 0. its always that way. does anyone know what that cursor represents?

before the very first LFS wait in the trace file i do see some RAC related waits but not before any of the other long LFS waits and ela is all low:

WAIT #140177208419296: nam='DLM cross inst call completion' ela= 369 caller instance number=3 cluster incarnation number=6 request identifier=279706 obj#=318298 tim=1910370663280
WAIT #140177208419296: nam='DLM cross inst call completion' ela= 288 caller instance number=3 cluster incarnation number=6 request identifier=279707 obj#=318298 tim=1910370663790
WAIT #140177208419296: nam='DLM cross inst call completion' ela= 9 caller instance number=3 cluster incarnation number=6 request identifier=279705 obj#=318298 tim=1910370664059
WAIT #140177208419296: nam='DLM cross inst call completion' ela= 8 caller instance number=3 cluster incarnation number=6 request identifier=279704 obj#=318298 tim=1910370664153
WAIT #140177208419296: nam='gc current multi block request' ela= 318 file#=21 block#=609247 id#=33554433 obj#=318297 tim=1910370666134 EXEC
#140177208419296:c=15522,e=22340,p=0,cr=66,cu=109,mis=1,r=1,dep=0,og=1,plh=0,tim=1910370666356 STAT #140177208419296 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL POSIX_DATASTORE_RECORDS (cr=4 pr=0 pw=0 str=1 time=840 us)' WAIT #140177208419296: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=318297 tim=1910370666499 WAIT #140177208419296: nam='SQL*Net message from client' ela= 576 driver id=1413697536 #bytes=1 p3=0 obj#=318297 tim=1910370667101 XCTEND rlbk=0, rd_only=0, tim=1910370667186 WAIT #0: nam='log file sync' ela= 317255 buffer#=49479 sync scn=20971597 p3=0 obj#=318297 tim=1910370984709
WAIT #0: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=318297 tim=1910370984802 WAIT #0: nam='SQL*Net message from client' ela= 902 driver id=1413697536 #bytes=1 p3=0 obj#=318297 tim=1910370985733 CLOSE #140177209347064:c=20,e=20,dep=0,type=1,tim=1910370985827

orachk found no issues with interconnect. AWR RAC report looked good to me and i don't see any errors on interface.

i think that covers all suggestions so far. I'm basically back to square one.

On Mon, Feb 3, 2020 at 12:44 AM Jon Crisler <joncrisler_at_gmail.com> wrote:

> To follow up on what Jonathan Lewis mentioned earlier.
>
> If "log file sync" becomes slow, but "log file parallel writes" are fast,
> then you can suspect cpu starvation or a OS scheduling problem
> Check the hidden parameters in your database for something called "_high_priority_processes"
> .
> When dealing with RAC on Linux, the default behavior for a LGWR write is
> to wait for the write to be sent to the other RAC nodes via LMS- when that
> is completed, LGWR is allowed to acknowledge the write.
> There is a frequent issue where the "oradism" program fails to upgrade the
> "class" of the process, such as RT (real time), TS (time share) etc. Do
> not take these classes as gospel- they are more like "strong suggestions"
> in regard to OS scheduling of the process. The reasons for oradism failing
> in this priority task is usually due to a security issue, setuid problem,
> antivirus program, and even a glibc issue.
>
> This parameter should include both LGWR* and LMS*, along with a few other
> processes which change from version to version. Check you parameter and
> see if either of those are working.
> When checking the alert.log on each RAC node, look for any errors or
> warnings during or shortly after instance startup pertaining to "oradism".
> Just grep the alert.log for that string.
> If you see something like "reducing LMS from X to 1" or something similar,
> you may have a problem with the setting or oradism. You can also use the
> "chrt" command in Linux to see the current scheduling class. I have a
> script I wrote that can check it from the OS and optionally upgrade it
> without bouncing the instance.
>
> My quick checks from memory-
> 1) If log_file_parallel_writes is fast, but log_file_sync is slow, suspect
> CPU starvation or OS scheduling issue Example
> log_file_parallel_writes=2ms, log_file_sync=30ms
> 2) if log_file_parallel_writes and log_file_sync is fast, then things are
> working fine. Example- log_file_parallel_writes = 2ms, log_file_sync=3ms
> 3) if log_file_parallel_writes and log_file_sync is slow, then suspect
> sub-optimal disk performance. Example- log_file_parallel_writes = 30ms,
> log_file_sync=35ms
> 4) If the private interconnect is congested, full of errors or retries, or
> otherwise a problem then the behavior can vary until this is addressed.
>
> The other thing to check is odd behavior at the disk level. Things like
> storage snapshots, deduplication or reduction tasks, or other storage work
> can play havoc with your metrics and leave you scratching you head, so try
> to see if there is a pattern (load, time etc.) and consult your storage
> administrator. Having something like OSwatcher , network stats, private
> interconnect stats for drops / errors etc. are very helpful.
>
> On Tue, Jan 28, 2020 at 11:10 AM Chris Stephens <cstephens16_at_gmail.com>
> wrote:
>
>> 3-node Oracle 19.3 RAC
>> Centos 7
>>
>> We have a SQLAlchemy/Python based application workload that is running
>> the exact same steps with widely varying response times which appear to be
>> related to varying "log file sync" wait times.
>>
>> any suggestions to discover root cause and/or solution?
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 05 2020 - 22:18:53 CET

Original text of this message