Re: intermittent long "log file sync" waits

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Tue, 28 Jan 2020 09:42:36 -0800
Message-ID: <d70d59ac-9dd0-1704-de20-370a43a1c1a2_at_gmail.com>



Chris,

This would be a good use-case for 10046 extended tracing, just sayin'...

But looking at the differences between the two profiles (i.e. "fast" and "slow"), the number of calls differs by just 1 call on "log file sync", "row cache lock" and "DLM cross-inst call completion", but for "library cache lock" and "library cache pin" there are 3 more and 16 less calls, respectively.

So, "log file sync" might be a root cause, but pinning objects in the Library Cache might also?  Problems getting locks in the library cache might corroborate the lack of any waits exhibited by "iostat", as the Library Cache exists entirely in memory (within the Shared Pool in the SGA).

But before obtaining a trace, may I first take a swing for the fence in the form of a SWAG?  Can you check GV$SGA_RESIZE_OPS and verify if the Shared Pool is being reduced in size in one (or more) of the RAC instances at the time when the "slow" profile is happening?

Thanks,

-Tim

On 1/28/2020 8:09 AM, Chris Stephens wrote:
> 3-node Oracle 19.3 RAC
> Centos 7
>
> We have a SQLAlchemy/Python based application workload that is running
> the exact same steps with widely varying response times which appear
> to be related to varying "log file sync" wait times.
>
> Here is a profile of a "fast" run:
>
> CALL-NAME                        DURATION       %   CALLS      MEAN  
>     MIN        MAX
> ------------------------------  ---------  ------  ------  --------
>  --------  ---------
> SQL*Net message from client     53.197782   91.8%  10,092  0.005271
>  0.000177  28.568493
> EXEC                             3.759177    6.5%   9,816  0.000383
>  0.000000   0.239592
> row cache lock                   0.233153    0.4%     541  0.000431
>  0.000113   0.000941
> PARSE                            0.140399    0.2%   4,867  0.000029
>  0.000000   0.006620
> DLM cross inst call completion   0.137330    0.2%     956  0.000144
>  0.000004   0.000505
> library cache lock               0.100171    0.2%     215  0.000466
>  0.000151   0.002133
> library cache pin                0.079729    0.1%     216  0.000369
>  0.000056   0.000710
> FETCH                            0.058253    0.1%   1,062  0.000055
>  0.000000   0.004148
> log file sync                    0.048217    0.1%     149  0.000324
>  0.000259   0.000505
> CLOSE                            0.045416    0.1%   4,929  0.000009
>  0.000000   0.000073
> 20 others                        0.135624    0.2%  11,854  0.000011
>  0.000000   0.001700
> ------------------------------  ---------  ------  ------  --------
>  --------  ---------
> TOTAL (30)                      57.935251  100.0%  44,697  0.001296
>  0.000000  28.568493
>
> Here is a profile of a "slow" run:
>
> CALL-NAME DURATION       %   CALLS      MEAN       MIN         MAX
> ------------------------------  ----------  ------  ------  --------
>  --------  ----------
> SQL*Net message from client     131.186118   61.0%  10,092  0.012999
>  0.000212  106.789360
> log file sync                    79.291166   36.8%     150  0.528608
>  0.000264    2.986575
> EXEC                              3.728402    1.7%   9,816  0.000380
>  0.000000    0.221403
> row cache lock                    0.248868    0.1%     542  0.000459
>  0.000111    0.001036
> PARSE                             0.164267    0.1%   4,867  0.000034
>  0.000000    0.004652
> DLM cross inst call completion    0.146981    0.1%     957  0.000154
>  0.000005    0.001188
> library cache lock                0.104354    0.0%     218  0.000479
>  0.000160    0.000728
> library cache pin                 0.082504    0.0%     202  0.000408
>  0.000157    0.000672
> FETCH                             0.056687    0.0%   1,062  0.000053
>  0.000000    0.003969
> CLOSE                             0.043590    0.0%   4,929  0.000009
>  0.000000    0.000180
> 20 others                         0.142044    0.1%  11,866  0.000012
>  0.000000    0.001792
> ------------------------------  ----------  ------  ------  --------
>  --------  ----------
> TOTAL (30)                      215.194981  100.0%  44,701  0.004814
>  0.000000  106.789360
>
> looking at even histogram for that event:
>
> SQL> _at_evh "log file sync"
>        EVH_EVENT    EVH_WAIT_TIME_MILLI    WAIT_COUNT  EVH_EST_TIME  
>                     LAST_UPDATE_TIME
> ________________ ______________________ _____________ _______________
> ______________________________________
> log file sync                < 1               200051     100.026
> 27-JAN-20 11.39.57.344734 PM -06:00
> log file sync                < 2                  165       0.248
> 28-JAN-20 12.18.10.429089 AM -06:00
> log file sync                < 4                  150        0.45
> 27-JAN-20 11.18.31.158102 PM -06:00
> log file sync                < 8                  199       1.194
> 27-JAN-20 11.19.14.209947 PM -06:00
> log file sync               < 16                  253       3.036
> 28-JAN-20 08.03.17.851328 AM -06:00
> log file sync               < 32                  472      11.328
> 27-JAN-20 11.20.22.746033 PM -06:00
> log file sync               < 64                  728      34.944
> 28-JAN-20 01.13.37.364541 AM -06:00
> log file sync              < 128                  691      66.336
> 27-JAN-20 11.31.37.400504 PM -06:00
> log file sync              < 256                  414      79.488
> 28-JAN-20 12.18.10.423987 AM -06:00
> log file sync              < 512                  405      155.52
> 28-JAN-20 03.27.50.540383 AM -06:00
> log file sync             < 1024                  459     352.512
> 27-JAN-20 11.35.14.378363 PM -06:00
> log file sync             < 2048                  482     740.352
> 28-JAN-20 01.18.20.556248 AM -06:00
> log file sync             < 4096                  576    1769.472
> 27-JAN-20 11.21.05.084998 PM -06:00
> log file sync             < 8192                   89     546.816
> 27-JAN-20 11.57.36.436460 AM -06:00
> log file sync            < 16384                   60      737.28
> 25-JAN-20 07.48.31.460408 AM -06:00
> log file sync            < 32768                   39     958.464
> 27-JAN-20 11.59.09.869286 AM -06:00
> log file sync            < 65536                   27    1327.104
> 25-JAN-20 09.49.13.856563 AM -06:00
>
> The weird thing is that I don't see corresponding log I/O waits
> (awaits) in iostat output.
>
> I have a ticket open w/ oracle but does anyone have any suggestions to
> discover root cause and/or solution?

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 28 2020 - 18:42:36 CET

Original text of this message