Re: intermittent long "log file sync" waits

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Fri, 7 Feb 2020 09:57:02 -0600
Message-ID: <CAEFL0szw-bZAN5Jn7Qg1Qb4wGWHEHmGfu2hGwxNd4fW-OBQ8fQ_at_mail.gmail.com>



while i work on a way to enable 10005:4 for new sessions associate w/ problem workload (doesn't look like i can use logon trigger/execute immediate for this), we have shut down 2 of 3 instances and are again testing to see if that changes anything.

_at_Priit i think that would eliminate any remastering/RAC related sync actions.

On Fri, Feb 7, 2020 at 7:53 AM Chris Stephens <cstephens16_at_gmail.com> wrote:

> Crystal clear explanation for everything Tanel. I owe you a beer(s)
> if/when we meet in real life.
>
> I'm going to see if we can't catch the x$trace post/wait data as well as
> psnapper on FG process and lgwr today. Will post any updates.
>
> It is odd that a reboot of the nodes seems to fix the issue for a limited
> time. I also keep going back to the slow writes to /tmp knowing GI makes
> use of /tmp/.oracle.
>
> you are correct: not exadata and not production.
>
> thanks again! hope you have the time to get that big announcement out on
> twitter today! love all those hacking sessions. i've learned to just block
> out an extra hour or two after scheduled end time. :)
>
> On Thu, Feb 6, 2020 at 10:43 PM Tanel Poder <tanel_at_tanelpoder.com> wrote:
>
>> Yeah, the "FG wakeup overhead per log file sync" derived metric
>> definitely shows that your foregrounds wait for log file syncs way longer
>> than the corresponding LGWR write duration. That metric divides the "redo
>> synch time overhead (usec)" with "redo synch writes" to get an average
>> extra latency per foreground log file sync. Normally it's just
>> microseconds, but in the output you sent it was close to 900
>> *milliseconds* per sync in some cases.
>>
>> I've slightly edited one of the snapper output sections and removed some
>> less interesting lines:
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> SID _at_INST, USERNAME , TYPE, STATISTIC ,
>> DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC,
>> AVERAGES
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 1717 _at_3, GEN3_MGOWE, STAT, *user commits ,
>> 6*, .49, , , ,
>> , .08 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, *messages sent ,
>> 6*, .49, , , ,
>> , .13 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, calls to kcmgas ,
>> 6, .49, , , , ,
>> .27 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo entries ,
>> 64, 5.21, , , , ,
>> 1.89 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo size ,
>> 22532, 1.83k, , , , ,
>> 5.63k bytes per user commit
>> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time ,
>> 1190, 96.85, , , , ,
>> 3.35 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time (usec) ,
>> 11902840, 968.69k, , , , ,
>> 33.53k per execution
>> * 1717 _at_3, GEN3_MGOWE, STAT, redo synch time overhead (usec) ,
>> 7933229, 645.63k, , , , ,
>> 274.72ms FG wakeup overhead per log file sync *
>> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time overhead count ( 2ms) ,
>> 2, .16, , , , ,
>> .06 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time overhead count (inf) ,
>> 4, .33, , , , ,
>> .01 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo synch writes ,
>> 6, .49, , , , ,
>> .1 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo synch long waits ,
>> 4, .33, , , , ,
>> .02 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, redo write info find ,
>> 6, .49, , , , ,
>> .08 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, undo change vector size ,
>> 5664, 460.95, , , , ,
>> 101.03 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, table fetch by rowid ,
>> 6, .49, , , , ,
>> 3.33 per execution
>> 1717 _at_3, GEN3_MGOWE, STAT, rows fetched via callback ,
>> 6, .49, , , , ,
>> .35 per execution
>> * 1717 _at_3, GEN3_MGOWE, WAIT, log file sync ,
>> 12202261, 993.06ms, 99.3%, [WWWWWWWWWW], 6, .49,
>> 2.03s average wait*
>> 1717 _at_3, GEN3_MGOWE, WAIT, SQL*Net message to client ,
>> 98, 7.98us, .0%, [ ], 60, 4.88,
>> 1.63us average wait
>> 1717 _at_3, GEN3_MGOWE, WAIT, SQL*Net message from client ,
>> 94111, 7.66ms, .8%, [W ], 60, 4.88,
>> 1.57ms average wait
>>
>> -- End of Stats snap 247, end=2020-02-05 17:41:57, seconds=12.3
>>
>> LGWR (as seen in another trace file) was almost completely idle during
>> the problem time.
>>
>> Also, from the wait events and stats we can see that LGWR did all the
>> work by itself, LG* slaves weren't active (other than polling the current
>> SCN very briefly in one snapshot).
>>
>> 1954 _at_3, (LG00) , WAIT, LGWR worker group idle , 12000000,
>> 962.4ms, 96.2%, [WWWWWWWWWW], 0, 0, 12s average
>> wait
>> 2198 _at_3, (LG01) , WAIT, LGWR worker group idle , 12000000,
>> 962.4ms, 96.2%, [WWWWWWWWWW], 0, 0, 12s average
>> wait
>>
>> Since we don't see any FG metrics like *redo synch polls* and *redo
>> synch poll writes* it looks like you're using the plain old post/wait
>> mechanism with LGWR. This looks like a case of missed post to me.
>>
>> As I understand this is not production, so you could use X$TRACE to dig
>> deeper into POST/WAIT messages. X$TRACE is actually always enabled anyway
>> (just not all events) so it should be safe for prod too, at least I haven't
>> crashed anything with it yet.
>>
>> You'd need to enable KST event 10005 at level 4 for just one test session
>> and LGWR. I've used the Oracle PID below - 64 is the Oracle PID (not SPID)
>> that my session is using. Would need to do the same for LGWR.
>>
>> SQL> ALTER TRACING ENABLE "10005:4:64";
>>
>> Tracing altered.
>>
>> And since X$TRACE is a circular buffer, you'd need to quickly dump the
>> X$TRACE buckets to disk or just CTAS it somewhere whenever you see a long
>> log file sync. You could use some PL/SQL loop that polls ASH or V$SESSION
>> and does such an INSERT-SELECT from X$TRACE whenever it sees anyone waiting
>> for "log file sync" for longer than 1 sec. You could also make the X$TRACE
>> buffers bigger to have a bit more time to react.
>>
>> The output would give you a timeline of session wait events *and* POSTS
>> sent and received. You can use the "relative_time_sec" column to see when
>> the POSTs were actually sent by your FG and received by LGWR ... and vice
>> versa.
>>
>> SQL> _at_xt sid=509
>>
>> relative_time_sec event file.c FUNCTION
>> sid data
>> ------------------- ------- -------------- ------------------------
>> -------
>> ------------------------------------------------------------------------------------------------------------------------------------
>> 634363269.049747 10005 ksl2.c_at_3349 kslwtbctx
>> 509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1
>> 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
>> 634363279.784638 10005 ksl2.c_at_3492 kslwtectx
>> 509 KSL WAIT END [SQL*Net message from client] 1952673792/0x74637000 1/0x1
>> 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
>> 634363279.784641 10005 ksl2.c_at_3496 kslwtectx
>> 509 KSL WAIT END wait times (usecs) - snap=10734887, exc=10734887,
>> tot=10734887
>> 634363279.784850 10005 ksl2.c_at_19892 ksl_update_post_stats
>> 509 *KSL POST SENT postee=20 num=3 loc='ksa2.h LINE:292 ID:ksasnd'
>> flags=0x0 id1=0 id2=0*
>> 634363279.784983 10005 ksl2.c_at_3349 kslwtbctx
>> 509 KSL WAIT BEG [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765
>> seq_num=2767 snap_id=1
>> 634363279.785058 10005 ksl2.c_at_3492 kslwtectx
>> 509 KSL WAIT END [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765
>> seq_num=2767 snap_id=1
>> 634363279.785060 10005 ksl2.c_at_3496 kslwtectx
>> 509 KSL WAIT END wait times (usecs) - snap=77, exc=77, tot=77
>> 634363279.785207 10005 ksl2.c_at_3349 kslwtbctx
>> 509 *KSL WAIT BEG [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0
>> wait_id=2766 seq_num=2768 snap_id=1*
>> 634363279.785888 10005 ksl2.c_at_18157 ksliwat
>> 509 KSL FACILITY WAIT fac#=3 time_waited_usecs=678
>> 634363279.785895 10005 ksl2.c_at_18276 ksliwat
>> 509 *KSL POST RCVD poster=22 num=8332 loc='ksl2.h LINE:4384 ID:kslpsr'
>> id1=162 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1*
>> 634363279.785904 10005 ksl2.c_at_3492 kslwtectx
>> 509 *KSL WAIT END [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0
>> wait_id=2766 seq_num=2768 snap_id=1*
>> 634363279.785905 10005 ksl2.c_at_3496 kslwtectx
>> 509 KSL WAIT END wait times (usecs) - snap=685, exc=685, tot=685
>> 634363279.785948 10005 ksl2.c_at_3349 kslwtbctx
>> 509 KSL WAIT BEG [SQL*Net message to client] 1952673792/0x74637000 1/0x1
>> 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
>> 634363279.785952 10005 ksl2.c_at_3492 kslwtectx
>> 509 KSL WAIT END [SQL*Net message to client] 1952673792/0x74637000 1/0x1
>> 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
>> 634363279.785953 10005 ksl2.c_at_3496 kslwtectx
>> 509 KSL WAIT END wait times (usecs) - snap=6, exc=6, tot=6
>> 634363279.785964 10005 ksl2.c_at_3349 kslwtbctx
>> 509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1
>> 0/0x0 wait_id=2768 seq_num=2770 snap_id=1
>>
>> The *KSL POST SENT postee=20* shows you the Oracle PID of LGWR in the
>> above example. My FG session sent a wakeup message to LGWR.
>>
>> The *KSL POST RCVD postee=22* shows that the "OK" reply came back from
>> another process (Oracle PID 22). This was the LG00 log writer slave in my
>> case. The LGnn slaves can do post-back to the foregrounds for scalability
>> (so that LGWR itself wouldn't spend so much time in the semaphore
>> operation system calls).
>>
>> At the end of the day, this would still just confirm that LGWR misses a
>> post for some reason, you'd still need to find which bug causes it.
>>
>> This doesn't seem to be Exadata, right? LGWR and commit processing with
>> Exafusion & Smart Fusion Block Transfer has gotten more complex (and
>> faster), the GC layer doesn't have to wait for log file sync completion
>> before shipping a modified block to remote instance, but commits on that
>> remote instance still have to confirm that the log write on the original
>> instance had completed. So there are additional potential
>> synchronization/wait points in that configuration. This feature should be
>> exadata-only, but maybe some artifacts from that implementation affect the
>> non-exa config too.
>>
>> One thing of interest, FG snapper output shows a short "IPC group service
>> call" wait event right after a long log file sync ended and that's related
>> to the Exafusion KSIPC layer.
>>
>> P.S. For people wondering, I talk about X$TRACE and advanced wait event &
>> post-wait topics in this hacking session:
>> https://www.youtube.com/watch?v=mkmvZv58W6w
>>
>> --
>> Tanel Poder
>> https://tanelpoder.com/seminar
>>
>> On Thu, Feb 6, 2020 at 1:55 PM Chris Stephens <cstephens16_at_gmail.com>
>> wrote:
>>
>>> At Sayan's excellent suggestion here is the full snapper output from
>>> foreground sessions and lg* processes:
>>>
>>> https://gist.github.com/cs2018ncsa/e9b0d442247f1434ceef21124bef18c3
>>>
>>> On Thu, Feb 6, 2020 at 10:17 AM Chris Stephens <cstephens16_at_gmail.com>
>>> wrote:
>>>
>>>> resending w/ inline snapper foreground output of max "FG wakeup
>>>> overhead per log file sync" time while i wait for moderator to approve
>>>> previous email w/ attachments of lgwr and foreground snapper output from
>>>> problem interval occurring later in the evening last night. like i said in
>>>> previous email, i'm not exactly sure how to zero in on issue but "FG
>>>> wakeup.." times look extremely suspicious to me.
>>>>
>>>> FG:
>>>>
>>>> *** 2020-02-05T17:44:15.593425-06:00
>>>> 1474 _at_3, GEN3_MGOWE, WAIT, SQL*Net message from client
>>>> , 12000000, 953.02ms, 95.3%, [WWWWWWWWWW],
>>>> 0, 0, 12s average wait
>>>> 1717 _at_3, GEN3_MGOWE, STAT, Requests to/from client
>>>> , 54, 4.29, , ,
>>>> , , .96 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, opened cursors cumulative
>>>> , 22, 1.75, , ,
>>>> , , .7 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, user commits
>>>> , 5, .4, , ,
>>>> , , .11 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, user calls
>>>> , 74, 5.88, , ,
>>>> , , 1.3 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, recursive calls
>>>> , 21, 1.67, , ,
>>>> , , 251.62us recursive CPU per recursive call
>>>> 1717 _at_3, GEN3_MGOWE, STAT, session logical reads
>>>> , 167, 13.26, , ,
>>>> , , 20.64k total buffer visits
>>>> 1717 _at_3, GEN3_MGOWE, STAT, CPU used when call started
>>>> , 2, .16, , ,
>>>> , , 760ms total CPU used when call started
>>>> 1717 _at_3, GEN3_MGOWE, STAT, CPU used by this session
>>>> , 2, .16, , ,
>>>> , , 760ms total CPU used by this session
>>>> 1717 _at_3, GEN3_MGOWE, STAT, DB time
>>>> , 1016, 80.69, , ,
>>>> , , 12.8 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, non-idle wait time
>>>> , 1234, 98, , ,
>>>> , , 12.93 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, non-idle wait count
>>>> , 194, 15.41, , ,
>>>> , , 3.72 per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, process last non-idle time
>>>> , 10, .79, , ,
>>>> , , 1.33M per execution
>>>> 1717 _at_3, GEN3_MGOWE, STAT, messages sent
>>>> , 6, .48, , ,
>>>> , , .14 per execution
>>>>
>>>>>
>>>>>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 07 2020 - 16:57:02 CET

Original text of this message