Re: intermittent long "log file sync" waits

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Fri, 7 Feb 2020 07:53:03 -0600
Message-ID: <CAEFL0szzzNE+nH64MiavUoEGadXxTOaGz5r1e-bqApv_G4W+Ag_at_mail.gmail.com>



Crystal clear explanation for everything Tanel. I owe you a beer(s) if/when we meet in real life.

I'm going to see if we can't catch the x$trace post/wait data as well as psnapper on FG process and lgwr today. Will post any updates.

It is odd that a reboot of the nodes seems to fix the issue for a limited time. I also keep going back to the slow writes to /tmp knowing GI makes use of /tmp/.oracle.

you are correct: not exadata and not production.

thanks again! hope you have the time to get that big announcement out on twitter today! love all those hacking sessions. i've learned to just block out an extra hour or two after scheduled end time. :)

On Thu, Feb 6, 2020 at 10:43 PM Tanel Poder <tanel_at_tanelpoder.com> wrote:

> Yeah, the "FG wakeup overhead per log file sync" derived metric definitely
> shows that your foregrounds wait for log file syncs way longer than the
> corresponding LGWR write duration. That metric divides the "redo synch time
> overhead (usec)" with "redo synch writes" to get an average extra latency
> per foreground log file sync. Normally it's just microseconds, but in the
> output you sent it was close to 900 *milliseconds* per sync in some cases.
>
> I've slightly edited one of the snapper output sections and removed some
> less interesting lines:
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> SID _at_INST, USERNAME , TYPE, STATISTIC ,
> DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC,
> AVERAGES
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 1717 _at_3, GEN3_MGOWE, STAT, *user commits ,
> 6*, .49, , , ,
> , .08 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, *messages sent ,
> 6*, .49, , , ,
> , .13 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, calls to kcmgas ,
> 6, .49, , , , ,
> .27 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo entries ,
> 64, 5.21, , , , ,
> 1.89 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo size ,
> 22532, 1.83k, , , , ,
> 5.63k bytes per user commit
> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time ,
> 1190, 96.85, , , , ,
> 3.35 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time (usec) ,
> 11902840, 968.69k, , , , ,
> 33.53k per execution
> * 1717 _at_3, GEN3_MGOWE, STAT, redo synch time overhead (usec) ,
> 7933229, 645.63k, , , , ,
> 274.72ms FG wakeup overhead per log file sync *
> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time overhead count ( 2ms) ,
> 2, .16, , , , ,
> .06 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo synch time overhead count (inf) ,
> 4, .33, , , , ,
> .01 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo synch writes ,
> 6, .49, , , , ,
> .1 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo synch long waits ,
> 4, .33, , , , ,
> .02 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, redo write info find ,
> 6, .49, , , , ,
> .08 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, undo change vector size ,
> 5664, 460.95, , , , ,
> 101.03 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, table fetch by rowid ,
> 6, .49, , , , ,
> 3.33 per execution
> 1717 _at_3, GEN3_MGOWE, STAT, rows fetched via callback ,
> 6, .49, , , , ,
> .35 per execution
> * 1717 _at_3, GEN3_MGOWE, WAIT, log file sync ,
> 12202261, 993.06ms, 99.3%, [WWWWWWWWWW], 6, .49,
> 2.03s average wait*
> 1717 _at_3, GEN3_MGOWE, WAIT, SQL*Net message to client ,
> 98, 7.98us, .0%, [ ], 60, 4.88,
> 1.63us average wait
> 1717 _at_3, GEN3_MGOWE, WAIT, SQL*Net message from client ,
> 94111, 7.66ms, .8%, [W ], 60, 4.88,
> 1.57ms average wait
>
> -- End of Stats snap 247, end=2020-02-05 17:41:57, seconds=12.3
>
> LGWR (as seen in another trace file) was almost completely idle during the
> problem time.
>
> Also, from the wait events and stats we can see that LGWR did all the work
> by itself, LG* slaves weren't active (other than polling the current SCN
> very briefly in one snapshot).
>
> 1954 _at_3, (LG00) , WAIT, LGWR worker group idle , 12000000,
> 962.4ms, 96.2%, [WWWWWWWWWW], 0, 0, 12s average
> wait
> 2198 _at_3, (LG01) , WAIT, LGWR worker group idle , 12000000,
> 962.4ms, 96.2%, [WWWWWWWWWW], 0, 0, 12s average
> wait
>
> Since we don't see any FG metrics like *redo synch polls* and *redo synch
> poll writes* it looks like you're using the plain old post/wait mechanism
> with LGWR. This looks like a case of missed post to me.
>
> As I understand this is not production, so you could use X$TRACE to dig
> deeper into POST/WAIT messages. X$TRACE is actually always enabled anyway
> (just not all events) so it should be safe for prod too, at least I haven't
> crashed anything with it yet.
>
> You'd need to enable KST event 10005 at level 4 for just one test session
> and LGWR. I've used the Oracle PID below - 64 is the Oracle PID (not SPID)
> that my session is using. Would need to do the same for LGWR.
>
> SQL> ALTER TRACING ENABLE "10005:4:64";
>
> Tracing altered.
>
> And since X$TRACE is a circular buffer, you'd need to quickly dump the
> X$TRACE buckets to disk or just CTAS it somewhere whenever you see a long
> log file sync. You could use some PL/SQL loop that polls ASH or V$SESSION
> and does such an INSERT-SELECT from X$TRACE whenever it sees anyone waiting
> for "log file sync" for longer than 1 sec. You could also make the X$TRACE
> buffers bigger to have a bit more time to react.
>
> The output would give you a timeline of session wait events *and* POSTS
> sent and received. You can use the "relative_time_sec" column to see when
> the POSTs were actually sent by your FG and received by LGWR ... and vice
> versa.
>
> SQL> _at_xt sid=509
>
> relative_time_sec event file.c FUNCTION
> sid data
> ------------------- ------- -------------- ------------------------
> -------
> ------------------------------------------------------------------------------------------------------------------------------------
> 634363269.049747 10005 ksl2.c_at_3349 kslwtbctx
> 509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1
> 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
> 634363279.784638 10005 ksl2.c_at_3492 kslwtectx
> 509 KSL WAIT END [SQL*Net message from client] 1952673792/0x74637000 1/0x1
> 0/0x0 wait_id=2764 seq_num=2766 snap_id=1
> 634363279.784641 10005 ksl2.c_at_3496 kslwtectx
> 509 KSL WAIT END wait times (usecs) - snap=10734887, exc=10734887,
> tot=10734887
> 634363279.784850 10005 ksl2.c_at_19892 ksl_update_post_stats
> 509 *KSL POST SENT postee=20 num=3 loc='ksa2.h LINE:292 ID:ksasnd'
> flags=0x0 id1=0 id2=0*
> 634363279.784983 10005 ksl2.c_at_3349 kslwtbctx
> 509 KSL WAIT BEG [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765
> seq_num=2767 snap_id=1
> 634363279.785058 10005 ksl2.c_at_3492 kslwtectx
> 509 KSL WAIT END [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765
> seq_num=2767 snap_id=1
> 634363279.785060 10005 ksl2.c_at_3496 kslwtectx
> 509 KSL WAIT END wait times (usecs) - snap=77, exc=77, tot=77
> 634363279.785207 10005 ksl2.c_at_3349 kslwtbctx
> 509 *KSL WAIT BEG [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0
> wait_id=2766 seq_num=2768 snap_id=1*
> 634363279.785888 10005 ksl2.c_at_18157 ksliwat
> 509 KSL FACILITY WAIT fac#=3 time_waited_usecs=678
> 634363279.785895 10005 ksl2.c_at_18276 ksliwat
> 509 *KSL POST RCVD poster=22 num=8332 loc='ksl2.h LINE:4384 ID:kslpsr'
> id1=162 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1*
> 634363279.785904 10005 ksl2.c_at_3492 kslwtectx
> 509 *KSL WAIT END [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0
> wait_id=2766 seq_num=2768 snap_id=1*
> 634363279.785905 10005 ksl2.c_at_3496 kslwtectx
> 509 KSL WAIT END wait times (usecs) - snap=685, exc=685, tot=685
> 634363279.785948 10005 ksl2.c_at_3349 kslwtbctx
> 509 KSL WAIT BEG [SQL*Net message to client] 1952673792/0x74637000 1/0x1
> 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
> 634363279.785952 10005 ksl2.c_at_3492 kslwtectx
> 509 KSL WAIT END [SQL*Net message to client] 1952673792/0x74637000 1/0x1
> 0/0x0 wait_id=2767 seq_num=2769 snap_id=1
> 634363279.785953 10005 ksl2.c_at_3496 kslwtectx
> 509 KSL WAIT END wait times (usecs) - snap=6, exc=6, tot=6
> 634363279.785964 10005 ksl2.c_at_3349 kslwtbctx
> 509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1
> 0/0x0 wait_id=2768 seq_num=2770 snap_id=1
>
> The *KSL POST SENT postee=20* shows you the Oracle PID of LGWR in the
> above example. My FG session sent a wakeup message to LGWR.
>
> The *KSL POST RCVD postee=22* shows that the "OK" reply came back from
> another process (Oracle PID 22). This was the LG00 log writer slave in my
> case. The LGnn slaves can do post-back to the foregrounds for scalability
> (so that LGWR itself wouldn't spend so much time in the semaphore
> operation system calls).
>
> At the end of the day, this would still just confirm that LGWR misses a
> post for some reason, you'd still need to find which bug causes it.
>
> This doesn't seem to be Exadata, right? LGWR and commit processing with
> Exafusion & Smart Fusion Block Transfer has gotten more complex (and
> faster), the GC layer doesn't have to wait for log file sync completion
> before shipping a modified block to remote instance, but commits on that
> remote instance still have to confirm that the log write on the original
> instance had completed. So there are additional potential
> synchronization/wait points in that configuration. This feature should be
> exadata-only, but maybe some artifacts from that implementation affect the
> non-exa config too.
>
> One thing of interest, FG snapper output shows a short "IPC group service
> call" wait event right after a long log file sync ended and that's related
> to the Exafusion KSIPC layer.
>
> P.S. For people wondering, I talk about X$TRACE and advanced wait event &
> post-wait topics in this hacking session:
> https://www.youtube.com/watch?v=mkmvZv58W6w
>
> --
> Tanel Poder
> https://tanelpoder.com/seminar
>
> On Thu, Feb 6, 2020 at 1:55 PM Chris Stephens <cstephens16_at_gmail.com>
> wrote:
>
>> At Sayan's excellent suggestion here is the full snapper output from
>> foreground sessions and lg* processes:
>>
>> https://gist.github.com/cs2018ncsa/e9b0d442247f1434ceef21124bef18c3
>>
>> On Thu, Feb 6, 2020 at 10:17 AM Chris Stephens <cstephens16_at_gmail.com>
>> wrote:
>>
>>> resending w/ inline snapper foreground output of max "FG wakeup overhead
>>> per log file sync" time while i wait for moderator to approve previous
>>> email w/ attachments of lgwr and foreground snapper output from problem
>>> interval occurring later in the evening last night. like i said in previous
>>> email, i'm not exactly sure how to zero in on issue but "FG wakeup.." times
>>> look extremely suspicious to me.
>>>
>>> FG:
>>>
>>> *** 2020-02-05T17:44:15.593425-06:00
>>> 1474 _at_3, GEN3_MGOWE, WAIT, SQL*Net message from client
>>> , 12000000, 953.02ms, 95.3%, [WWWWWWWWWW],
>>> 0, 0, 12s average wait
>>> 1717 _at_3, GEN3_MGOWE, STAT, Requests to/from client
>>> , 54, 4.29, , ,
>>> , , .96 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, opened cursors cumulative
>>> , 22, 1.75, , ,
>>> , , .7 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, user commits
>>> , 5, .4, , ,
>>> , , .11 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, user calls
>>> , 74, 5.88, , ,
>>> , , 1.3 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, recursive calls
>>> , 21, 1.67, , ,
>>> , , 251.62us recursive CPU per recursive call
>>> 1717 _at_3, GEN3_MGOWE, STAT, session logical reads
>>> , 167, 13.26, , ,
>>> , , 20.64k total buffer visits
>>> 1717 _at_3, GEN3_MGOWE, STAT, CPU used when call started
>>> , 2, .16, , ,
>>> , , 760ms total CPU used when call started
>>> 1717 _at_3, GEN3_MGOWE, STAT, CPU used by this session
>>> , 2, .16, , ,
>>> , , 760ms total CPU used by this session
>>> 1717 _at_3, GEN3_MGOWE, STAT, DB time
>>> , 1016, 80.69, , ,
>>> , , 12.8 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, non-idle wait time
>>> , 1234, 98, , ,
>>> , , 12.93 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, non-idle wait count
>>> , 194, 15.41, , ,
>>> , , 3.72 per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, process last non-idle time
>>> , 10, .79, , ,
>>> , , 1.33M per execution
>>> 1717 _at_3, GEN3_MGOWE, STAT, messages sent
>>> , 6, .48, , ,
>>> , , .14 per execution
>>>
>>>>
>>>>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 07 2020 - 14:53:03 CET

Original text of this message