Return-Path: <oracle-l-bounce@freelists.org>
X-Original-To: oracle-l@orafaq.com
Delivered-To: oracle-l@orafaq.com
Received: from turing.freelists.org (turing.freelists.org [206.53.239.180])
 by malta2546.startdedicated.com (Postfix) with ESMTPS id 4DB301002C5404
 for <oracle-l@orafaq.com>; Sun,  2 Feb 2020 16:36:53 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 4CA9D21335;
 Sun,  2 Feb 2020 10:36:49 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freelists.org;
 s=turing; t=1580657809;
 bh=Oh3RtsOWJtaN+4VopCEaHBvsH5cCPrpi7t+xuB7w5ls=;
 h=From:Sender:Sender:From;
 b=uY7SF7kcRs5bM6zFJl9nKrv90B/vVCc/uJY4basgwxi6CEsYAli0YLUCR+0KIrgy7
	 nJbGVnKbwCuSnEOz7IaH9bycVTcZqc8RbLVbT2jnjAWxoZSyaiB+h0W/c3gwXC1aHL
	 r0G8AF65ZVg53MG5JC4SLBSOkreiOYnb6BpUIkyg=
X-Virus-Scanned: Debian amavisd-new at turing.freelists.org
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id hV9GHYof7GDw; Sun,  2 Feb 2020 10:36:48 -0500 (EST)
Received: from turing.freelists.org (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 8695622C39;
 Sun,  2 Feb 2020 10:36:00 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freelists.org;
 s=turing; t=1580657804;
 bh=Oh3RtsOWJtaN+4VopCEaHBvsH5cCPrpi7t+xuB7w5ls=;
 h=From:Sender:Sender:From;
 b=c/A71CTzt3TGO2JCKT3n30mtYheDvYBlrsRYbxOoOpMotrwBg1R5rw0z6ENYxGsbT
	 vGdIkZihszwaCaTOrYmllntJf7iWnxRxAgu8a2XHDwqVaKJJdy5mi1afZ1RETS2mAE
	 KBTD7ttWuBW/A6Ga+RVhwLrGjhkfIMtVu3ctULaM=
Received: with ECARTIS (v1.0.0; list oracle-l); Sun, 02 Feb 2020 10:35:14 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 367D924756
 for <Oracle-L@freelists.org>; Sun,  2 Feb 2020 10:35:14 -0500 (EST)
Authentication-Results: turing.freelists.org;
 dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="j5YsOtfO";
 dkim-atps=neutral
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id cituokwVgBkq for <Oracle-L@freelists.org>;
 Sun,  2 Feb 2020 10:35:13 -0500 (EST)
Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 7242122D4A
 for <Oracle-L@freelists.org>; Sun,  2 Feb 2020 10:35:13 -0500 (EST)
Received: by mail-qt1-f181.google.com with SMTP id t13so9507960qto.3
        for <Oracle-L@freelists.org>; Sun, 02 Feb 2020 07:35:13 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=A0qPPGjY6U1k4JWtChgJ0mGNpSq95ZMi4hniMNYvz5o=;
        b=UD347CN6n+4HuCZZpjjEZojmTc63yzUW28VYkLPUMZGpQIQxv4SSphhDfnUAn5AIK9
         sFDLAsSq4SMIsS2eKJaUbYbgIGs7uuAg0m8lBBH3MZYGWZz/UPvVc+jTVXXM8IJhPv3S
         msNPilL7BGki7nPnoGLumkpq7FUwtikIKSfZ50vD+4wMoQDfDpePrImStECX1a5oObfD
         WBOqTdOxIF4XTWOzu2HeAKBt2J95UtJarnncbCDhU9++txTI/cvveefgJyzM93ko3YQP
         SrhEt5gEhjhq5TXjn1OVI/TZMD8Gq3vAagEwXeyPxYvo7mPu30hxq8oBDGjIlrTYXY/X
         lUoA==
X-Gm-Message-State: APjAAAXnt3IJ0Kuu7Adt89ETz4Bbn3h5iTCcrFgTPxMp53ZmnPSnJbLi
 f7DWJU9gdTEFNM51mtB87zooqfzpNcHvtVPYRgKVNHnl
X-Google-Smtp-Source: APXvYqwlhbePsp9GiYsUjNg27Df225xGA/CLez+QPTVq5bFF4ErjBPA0dF+TG3QHN8mODo3iGSRiCb0cKz15+jLzXVc=
X-Received: by 2002:ac8:1198:: with SMTP id d24mr19831408qtj.105.1580657712587;
 Sun, 02 Feb 2020 07:35:12 -0800 (PST)
MIME-Version: 1.0
References: <CAEFL0szEBH0P-hqb39Pmjw-pdJ60OZ+RUuFoqCXsC7+=iPEFzQ@mail.gmail.com>
 <38763_1580228753_5E306091_38763_12713_1_cf89e1868e2e40949f6177bf1c604696@vontobel.com>
 <CAEFL0swS2CGeDF2pGWHDBQkO8J1cH7cMbDCxBngx613UW52A+Q@mail.gmail.com>
 <39760_1580232621_5E306FAD_39760_6519_1_238f704e660247f78d5e2f6a5c98b27f@vontobel.com>
 <CAEFL0szP5+-ZUog8Q=P0TiusLw6MpDDM46hZM=7ps0v0o3p_OQ@mail.gmail.com>
 <38763_1580238546_5E3086D2_38763_18592_1_8d9cba2aa89d4aba82f0b60173245848@vontobel.com>
 <CAEFL0szn17WU9kSj7ipRhieXd4w_3hzhU5gSmxDxhx1=WnqDkw@mail.gmail.com>
 <CAMHX9J+FooJWWr896aFx7oqeMk=PKA__GaHjxjU_gsOyE1Tzjg@mail.gmail.com>
 <CAEFL0sx7fAF89Qike6PS2Jy9-N5S-w-r31XQ7kj8AEGP5FonTg@mail.gmail.com>
 <CAEFL0syR4Xy2fPiL4M3jtmmfj_02MmYgC_haCM=LDZ17ikuGgg@mail.gmail.com>
 <CAMHX9JLchyEEvh4mHH8hHXEa53r5FP4Ck-kpCMLUiTgOJzYuEw@mail.gmail.com>
 <CAMHX9JLW_i_c+7V0jMwXhWeAKNvUhEdVkf0sKEyHg62XyZP3aA@mail.gmail.com> <5571F2DD-2C37-44EB-AAF6-A692E0779AD1@gmail.com>
In-Reply-To: <5571F2DD-2C37-44EB-AAF6-A692E0779AD1@gmail.com>
From: Chris Stephens <cstephens16@gmail.com>
Date: Sun, 2 Feb 2020 09:35:01 -0600
Message-ID: <CAEFL0sxtrXooGYrb52urb200nqMEmGiEH5k-wUNWzPyVT8+cFQ@mail.gmail.com>
Subject: Re: intermittent long "log file sync" waits
To: Frits Hoogland <frits.hoogland@gmail.com>
Cc: Tanel Poder <tanel@tanelpoder.com>, Noveljic Nenad <nenad.noveljic@vontobel.com>,
 oracle-l <Oracle-L@freelists.org>
Content-Type: multipart/alternative; boundary="0000000000007868c6059d998d60"
X-archive-position: 76134
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: cstephens16@gmail.com
Precedence: normal
Reply-To: cstephens16@gmail.com
List-Help: <mailto:ecartis@freelists.org?Subject=help>
List-Unsubscribe: <mailto:oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-Subscribe: <mailto:oracle-l-request@freelists.org?Subject=subscribe>
List-Owner: <mailto:>
List-post: <mailto:oracle-l@freelists.org>
List-Archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l
--0000000000007868c6059d998d60
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thanks for the detailed posts Fritz! Very helpful/enlightening.  I will
check those lgwr stats if/when we can reproduce the issue since all
instances have been restarted since last occurrence.



On Sun, Feb 2, 2020 at 6:51 AM Frits Hoogland <frits.hoogland@gmail.com>
wrote:

> I didn=E2=80=99t look that deep, I tried to get an idea of the situation =
and the
> current state.
>
> This, as others have concluded already, then looks like the bug that seem=
s
> to surface every once in a while where sessions that use post-wait for
> waiting for the logwriter get =E2=80=9Cforgotten=E2=80=9D when the instan=
ce switches to
> polling.
> The switching is visible, or at least was visible when I looked at it in
> versions 11.2 and 12.1.0.2, by looking at the logwriter trace file, it
> mentions when the logwriter switches between post-wait and polling and ba=
ck.
> However, in these cases, the process would way longer, until the logwrite=
r
> decided to switch back to post-wait and found the waiting sessions again.
>
> When this is a bug with the above described issue, of course it needs to
> be fixed. However, there is a way around it:
> - In post-wait mode, a process waiting in =E2=80=98log file sync=E2=80=99=
 can only end the
> wait if two conditions are met: it needs to be posted and the on_disk_scn
> must be beyond the process=E2=80=99 commit SCN. Based on what is shown, I=
 assume
> the logwriter happily gone on and had the on disk scn updated (it does th=
at
> always every 3 seconds) probably beyond the process=E2=80=99 commit scn. =
So the
> most logical thing the process is waiting for, is to be posted, which did
> not happen.
> - In polling mode, a process will wait for a pre-computed time, during
> which it can not be notified, and then check for the on disk scn. The
> pre-computed time in normal circumstances is really low, probably around =
a
> millisecond, but computed based on the latency of your system. The
> important bit is there is no need for the logwriter to do anything: the
> waiting is based on the progress that the logwriter makes by means of the
> on disk scn, but it doesn=E2=80=99t require additional work, like being p=
osted.
> - Therefore, if you fix your instances to use polling only, it can not
> come in the situation of a post-wait process being forgotten. This can be
> achieved by setting =E2=80=9C_use_adaptive_log_file_sync=E2=80=9D to poll=
ing_only. This is
> an underscore parameter, and therefore you have to ask blessing from orac=
le
> support to use it, but since you already got a SR open, this should be
> simple to do.
>
> I can=E2=80=99t think of any downsides of using polling_only. Maybe someo=
ne can?
> Even the use of it, since it=E2=80=99s switched to dynamically by all sup=
ported
> versions, is high.
>
> Frits Hoogland
>
> http://fritshoogland.wordpress.com
> frits.hoogland@gmail.com
> Mobile: +31 6 14180860
>
> On 2 Feb 2020, at 04:50, Tanel Poder <tanel@tanelpoder.com> wrote:
>
> But your trace file here already indicates that there's some glitch
> regarding LGWR wakeups/posting:
>
> *Foreground:*
>
> *** 2020-01-29T*13:04:20*.724666-06:00
> WAIT #0: nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync scn=3D=
976089279
> p3=3D0 obj#=3D-1 tim=3D1025228531958
>
> *LGWR at the same time:*
>
> *** 2020-01-29T*13:04:20*.725994-06:00
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0 p3=
=3D0
> obj#=3D-1 tim=3D1025228533313
>
> The WAIT lines are printed out when the wait ends (and the *** timestamp
> line just before the WAIT line is printed out, if enough time has passed
> since the last *** line). So this shows that LGWR had happily been idle,
> waiting for its idle 3-second timeout to expire, or someone to wake it up
> (that's what "rdbms ipc message" means) - while someone had issued a comm=
it
> and was waiting for LGWR acknowledgment. So this shows that LGWR didn't
> even get the post from your session, nor any other sessions (if you had
> concurrent transactional activity). Back in 11g, typically these problems
> happened when adaptive log file syncing switch between poll and post/wait
> wakeup method happened and you hit a bug. But in 12c there are more movin=
g
> parts.
>
> --
> Tanel Poder
> https://blog.tanelpoder.com/seminar
>
> On Sat, Feb 1, 2020 at 10:34 PM Tanel Poder <tanel@tanelpoder.com> wrote:
>
>> Hi Chris,
>>
>> As 12c+ has log writer slaves and Oracle can adaptavely switch from
>> single LGWR to LGWR + LGxx slave (_use_single_log_writer=3Dadaptive), ca=
n you
>> check ASH data for the LGxx slaves at this time too. Not all
>> process-to-process interactions/waits are instrumented with a
>> blocking_session in Oracle...
>>
>> Should be this command:
>>
>> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>> "program like '%(LG%)'" "TIMESTAMP'2020-01-29 13:04:00'"
>> "TIMESTAMP'2020-01-29 13:05:00'"
>>
>> --
>> Tanel Poder
>> https://blog.tanelpoder.com/seminar
>>
>>
>> On Thu, Jan 30, 2020 at 3:44 PM Chris Stephens <cstephens16@gmail.com>
>> wrote:
>>
>>> Here's some additional ash output from the trace data posted in last
>>> message including event_hist.sql:
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> 1=3D1 "TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-29 13:04:27'"
>>>    TOTALSECONDS    AAS      %This         USERNAME           EVENT2
>>>  PROGRAM2    BLOCKING_SESSION    BLOCKING_INST_ID             FIRST_SEE=
N
>>>            LAST_SEEN    DIST_SQLEXEC_SEEN
>>> _______________ ______ __________ ________________ ________________
>>> _____________ ___________________ ___________________
>>> ______________________ ______________________ ____________________
>>>               2    0.3   50% |    SYS              ON CPU
>>> (sqlplus)                                             2020-01-29 13:04:=
24
>>>  2020-01-29 13:04:25                       1
>>>               1    0.1   25% |    GEN3_MGOWER_3    ON CPU
>>> (python)                                              2020-01-29 13:04:=
22
>>>  2020-01-29 13:04:22                       1
>>>               1    0.1   25% |    GEN3_MGOWER_3    log file sync
>>>  (python)                     1710                   2 2020-01-29 13:04=
:20
>>>    2020-01-29 13:04:20                       1
>>>
>>>
>>> SQL> @bg lgwr
>>>    NAME    DESCRIPTION     SID    OPID     SPID               PADDR
>>>           SADDR
>>> _______ ______________ _______ _______ ________ ___________________
>>> ___________________
>>> LGWR    Redo etc.         1710      34 26552    00000001E8718860
>>>  00000001D8BFF4A0
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-2=
9
>>> 13:04:27'"
>>>
>>> no rows selected
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-2=
9
>>> 13:05:00'"
>>>
>>> no rows selected
>>>
>>> SQL> @ash/event_hist "log file" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>        EVH_EVENT       EVH_MILLISEC    EVH_SAMPLE_COUNT
>>>  EVH_EST_EVENT_COUNT    EVH_EST_TOTAL_SEC    PCT_EVT_TIME       EVH_GRA=
PH
>>>           FIRST_SEEN              LAST_SEEN
>>> ________________ __________________ ___________________
>>> ______________________ ____________________ _______________ ___________=
____
>>> ______________________ ______________________
>>> log file sync             < 1024                      1
>>>    1.8                  1.8            12.8 |#         |    2020-01-29
>>> 13:04:13    2020-01-29 13:04:13
>>> log file sync             < 4096                      1
>>>      1                  4.1            29.1 |###       |    2020-01-29
>>> 13:04:12    2020-01-29 13:04:12
>>> log file sync             < 8192                      1
>>>      1                  8.2            58.2 |######    |    2020-01-29
>>> 13:04:20    2020-01-29 13:04:20
>>>
>>> SQL> @ash/event_hist "write" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>
>>> no rows selected
>>>
>>> SQL> @ash/event_hist "*" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>         EVH_EVENT       EVH_MILLISEC    EVH_SAMPLE_COUNT
>>>  EVH_EST_EVENT_COUNT    EVH_EST_TOTAL_SEC    PCT_EVT_TIME       EVH_GRA=
PH
>>>           FIRST_SEEN              LAST_SEEN
>>> _________________ __________________ ___________________
>>> ______________________ ____________________ _______________ ___________=
____
>>> ______________________ ______________________
>>> log file sync              < 1024                      1
>>>    1.8                  1.8            12.8 |#         |    2020-01-29
>>> 13:04:13    2020-01-29 13:04:13
>>> log file sync              < 4096                      1
>>>      1                  4.1            29.1 |###       |    2020-01-29
>>> 13:04:12    2020-01-29 13:04:12
>>> log file sync              < 8192                      1
>>>      1                  8.2            58.2 |######    |    2020-01-29
>>> 13:04:20    2020-01-29 13:04:20
>>> row cache lock                < 1                      1
>>> 2469.1                  2.5             100 |##########|    2020-01-29
>>> 13:04:01    2020-01-29 13:04:01
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> "lower(program) like '%lg%'" "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>
>>> no rows selected
>>>
>>> i'm going to copy over gv$active_session_history to a single instance
>>> database and take a look at ash_wait_chains but i don't expect that to
>>> reveal anything either.
>>>
>>> i need to confirm whether adding redo resolved issue (will have
>>> developer fire up workload) but I think next step is to investigate x$t=
race
>>> and possible process communication issues?
>>>
>>> On Thu, Jan 30, 2020 at 10:27 AM Chris Stephens <cstephens16@gmail.com>
>>> wrote:
>>>
>>>> Hey Tanel,
>>>>
>>>> Thanks for taking time to write this up! Will definitely be attending
>>>> that webinar.
>>>>
>>>> comments inline
>>>>
>>>> On Wed, Jan 29, 2020 at 10:32 PM Tanel Poder <tanel@tanelpoder.com>
>>>> wrote:
>>>>
>>>>> Comments:
>>>>>
>>>>> 1) First, looks like you're using all the right tools for systematic
>>>>> troubleshooting ;-)
>>>>>
>>>>
>>>> ***i try!
>>>>
>>>>>
>>>>> 2) As "log file sync" is not directly an I/O wait event (IO happens
>>>>> indirectly under the hood), but "wait for LGWR to ACK/increment redo
>>>>> persisting position" then there may be multiple reasons for delays. I=
'm
>>>>> repeating some of what others have said here too:
>>>>>   - LGWR could be waiting for remote ACK (synchronous data guard) -
>>>>> that should show up as a LNS* wait or something like that though
>>>>>   - Process slowness due to CPU overload/scheduling latency or
>>>>> swapping (in both cases, LGWR may do its work fast, but it takes a wh=
ile
>>>>> for the foreground process to wake up and end the wait event - you co=
uld
>>>>> check if there are any reports regarding switching LGWR mode in alert=
.log
>>>>> around that time)
>>>>>   - Bug/missed post due to switching between LGWR progress polling vs=
.
>>>>> waiting for LGWR post
>>>>>   - ASM instance communication issues (ASM communication can be on th=
e
>>>>> critical path when creating/changing files on ASM and even for readin=
g,
>>>>> when the ASM metadata cache in your local instance doesn't have what =
it
>>>>> needs)
>>>>>   - So, the physical redo log file disk write is not necessarily the
>>>>> biggest contributor to the log file sync wait
>>>>>
>>>>> *** no data guard in the env.
>>>> *** we have excess CPU. (20 cores / instance and usually only 1-5
>>>> active foreground processes but i will confirm from TFA collections)
>>>> *** yesterday i was ready to blame bug. today, not so sure. i checked
>>>> log file switches with a script google found me a few days ago. now
>>>> thinking that was inaccurate as a script from metastink shows tones of
>>>> switching so that could very well be related.
>>>> ***asm communication is something i need to look into. TFA has reporte=
d
>>>> some ASM issues but there's no details and i never saw anything in asm
>>>> alert logs indicating an issue.
>>>>
>>>>
>>>>> 3) You'll need to measure what LGWR itself was doing during the
>>>>> problem time
>>>>>   - Since you're on 19c (12c or later) the LGWR has slaves too and
>>>>> they may be doing most of the work
>>>>>   - You'll need to look into both the LGWR process and any other LG*
>>>>> ones too
>>>>>   - Since you have (Oracle) Snapper running already and writing to
>>>>> trace you should have the data already
>>>>>   - The *lgwr* in the following @snapper ash,stats,trace 10 999999
>>>>> lgwr syntax includes LGWR slaves too
>>>>>   - As LGWR slaves have their own sessions (and SIDs), you should run
>>>>> ashtop etc and report if there's any BLOCKING_SESSION value listed fo=
r LGWR
>>>>> and slaves
>>>>>   - If LGWR and the slaves were listed *idle* during the problem time
>>>>> (while others waited for log file sync event at the same time) *and
>>>>> if *there was no huge OS level CPU/memory shortage at the time, this
>>>>> looks like a background <-> foreground communication problem.
>>>>>      - This (and any potential ASM issues) could probably be
>>>>> troubleshooted via KST tracing, KSR channels and X$TRACE that could h=
elp
>>>>> here (some of it is enabled by default ( Like I showed in a previous
>>>>> hacking session: https://www.youtube.com/watch?v=3DmkmvZv58W6w )
>>>>>
>>>>>
>>>> here is all the output i have corresponding to a 6 sec "log file sync"
>>>> wait:
>>>>
>>>> Experience duration: *R* =3D *t*1 =E2=88=92 *t*0 =3D 41.336137 seconds
>>>> Experience begin time: *t*0 =3D 2020-01-29T13:04:07.560478
>>>> Experience end time: *t*1 =3D 2020-01-29T13:04:48.896615
>>>> CALL-NAME                        DURATION       %   CALLS      MEAN
>>>>   MIN       MAX
>>>> ------------------------------  ---------  ------  ------  --------
>>>>  --------  --------
>>>> SQL*Net message from client     26.017105   66.5%   9,886  0.002632
>>>>  0.000209  7.858051
>>>> log file sync                   10.582409   27.1%     150  0.070549
>>>>  0.000295  6.909711
>>>> EXEC                             2.088757    5.3%   9,702  0.000215
>>>>  0.000000  0.209199
>>>> DLM cross inst call completion   0.109617    0.3%     212  0.000517
>>>>  0.000005  0.080575
>>>> row cache lock                   0.070118    0.2%     125  0.000561
>>>>  0.000208  0.000972
>>>> FETCH                            0.047586    0.1%   1,029  0.000046
>>>>  0.000000  0.003056
>>>> PARSE                            0.045293    0.1%   4,778  0.000009
>>>>  0.000000  0.002692
>>>> CLOSE                            0.033453    0.1%   4,784  0.000007
>>>>  0.000000  0.000237
>>>> library cache lock               0.023614    0.1%      55  0.000429
>>>>  0.000152  0.000901
>>>> library cache pin                0.022311    0.1%      55  0.000406
>>>>  0.000153  0.000689
>>>> 17 others                        0.071430    0.2%  10,772  0.000007
>>>>  0.000000  0.001487
>>>> ------------------------------  ---------  ------  ------  --------
>>>>  --------  --------
>>>> TOTAL (27)                      39.111693  100.0%  41,548  0.000941
>>>>  0.000000  7.858051
>>>>
>>>> 6 sec wait from raw data:
>>>>
>>>> *** 2020-01-29T13:04:20.724666-06:00
>>>> WAIT #0: nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync
>>>> scn=3D976089279 p3=3D0 obj#=3D-1 tim=3D1025228531958
>>>> WAIT #0: nam=3D'SQL*Net message to client' ela=3D 2 driver id=3D141369=
7536
>>>> #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228532118
>>>> WAIT #0: nam=3D'SQL*Net message from client' ela=3D 1711 driver
>>>> id=3D1413697536 #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228533852
>>>> BINDS #140204587734864:
>>>>
>>>> i enabled 10046 on lgwr:
>>>>
>>>> *** 2020-01-29T13:04:20.725994-06:00
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0=
 p3=3D0
>>>> obj#=3D-1 tim=3D1025228533313
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 279 files=3D1 blocks=
=3D17
>>>> requests=3D2 obj#=3D-1 tim=3D1025228533777
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 171 files=3D1 blocks=
=3D3
>>>> requests=3D2 obj#=3D-1 tim=3D1025228534104
>>>> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 22 copy latch #=3D28 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025228534182
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 183 files=3D1 blocks=
=3D2
>>>> requests=3D2 obj#=3D-1 tim=3D1025228534437
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 174 files=3D1 blocks=
=3D2
>>>> requests=3D2 obj#=3D-1 tim=3D1025228534721
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 31 timeout=3D8 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228534786
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 3233 timeout=3D8 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228538038
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 170 files=3D1 blocks=
=3D2
>>>> requests=3D1 obj#=3D-1 tim=3D1025228538297
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 25 timeout=3D8 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228538358
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 11829 timeout=3D8 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228550204
>>>> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 52 copy latch #=3D8 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025228550294
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 239 files=3D1 blocks=
=3D7
>>>> requests=3D1 obj#=3D-1 tim=3D1025228550624
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 23 timeout=3D6 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228550699
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16750 timeout=3D6 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228567467
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 207 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228567762
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 26 timeout=3D5 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228567825
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16249 timeout=3D5 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228584091
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 213 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228584400
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 48 timeout=3D3 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228584502
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16466 timeout=3D3 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228600993
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 228 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228601393
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 29 timeout=3D1 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228601482
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 9548 timeout=3D1 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228611053
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 7467 timeout=3D300 p2=3D0 p3=
=3D0
>>>> obj#=3D-1 tim=3D1025228618574
>>>> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 13 copy latch #=3D8 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025228618636
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 222 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228618960
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 31 timeout=3D300 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228619066
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 15946 timeout=3D300 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228635045
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 240 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228635427
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 33 timeout=3D298 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228635525
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 15987 timeout=3D298 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228651537
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228651912
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 51 timeout=3D296 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228652034
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16332 timeout=3D296 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228668405
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 256 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228668804
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D295 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228668911
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16969 timeout=3D295 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228685907
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 273 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228686327
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D293 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228686453
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 6908 timeout=3D293 p2=3D0 p3=
=3D0
>>>> obj#=3D-1 tim=3D1025228693390
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1 blocks=
=3D2
>>>> requests=3D1 obj#=3D-1 tim=3D1025228693768
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D292 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228693862j
>>>>  ...more of the same
>>>> 264396 WAIT #0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D52 p2=3D=
0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231090742
>>>>  264397 WAIT #0: nam=3D'rdbms ipc message' ela=3D 19460 timeout=3D52 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231110227
>>>>  264398 WAIT #0: nam=3D'log file parallel write' ela=3D 272 files=3D1
>>>> blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231110633
>>>>  264399 WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D50 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231110748
>>>>  264400 WAIT #0: nam=3D'rdbms ipc message' ela=3D 15484 timeout=3D50 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231126261
>>>>  264401 WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1
>>>> blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231126641
>>>>  264402 WAIT #0: nam=3D'rdbms ipc message' ela=3D 36 timeout=3D49 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231126753
>>>>  264403 WAIT #0: nam=3D'rdbms ipc message' ela=3D 16732 timeout=3D49 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231143514
>>>>  264404 WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 31 copy latch =
#=3D8
>>>> p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231143603
>>>>  264405 WAIT #0: nam=3D'log file parallel write' ela=3D 279 files=3D1
>>>> blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231143976
>>>>  264406 WAIT #0: nam=3D'rdbms ipc message' ela=3D 35 timeout=3D47 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231144093
>>>>  264407 WAIT #0: nam=3D'rdbms ipc message' ela=3D 5384 timeout=3D47 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231149515
>>>>  264408 WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1
>>>> blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231149889
>>>>  264409 WAIT #0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D47 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231149989
>>>>  264410
>>>>  264411 *** 2020-01-29T13:04:23.802774-06:00
>>>>  264412 WAIT #0: nam=3D'rdbms ipc message' ela=3D 460056 timeout=3D46 =
p2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231610086
>>>>  264413
>>>>  264414 *** 2020-01-29T13:04:26.797925-06:00
>>>>  264415 WAIT #0: nam=3D'rdbms ipc message' ela=3D 2995027 timeout=3D30=
0 p2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025234605238
>>>>  264416 WAIT #0: nam=3D'rdbms ipc message' ela=3D 10649 timeout=3D1 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025234616054
>>>>  264417
>>>>
>>>> i will get you snapper lgwr output associated w/ this interval as soon
>>>> as i can find it (can't remember where i stashed it and lost trace fil=
e
>>>> name .. fml)
>>>>
>>>> 4) Paste the full *Snapper* output (of one time snapshot from the
>>>>> problem time) here if you can - we'll be able to see what the slaves =
were
>>>>> doing
>>>>>   - You'd need to include not only LGWR SID but also any LG* slave
>>>>> SIDs too
>>>>>   - Also we'll see better LGWR delay metrics from V$SESSTAT that go
>>>>> beyond just wait events
>>>>>   - Thinking of metrics like these - snapper already handles them (if
>>>>> you look into the rightmost column)
>>>>>      - *redo synch time overhead (usec)* - "FG wakeup overhead per
>>>>> log file sync"
>>>>>      - *redo write time* - accounting end to end redo write time (not
>>>>> just the log file parallel write syscall) "per redo write"
>>>>>
>>>>> 5) Regarding the *ASH wait chains* error on 19c - I'm aware of this
>>>>> but forgot to look into it
>>>>>    - It has (probably) something to do with GV$ views (or PX slaves
>>>>> used for PDB V$ queries) and possibly the connect by I'm using
>>>>>    - Does it work in the CDB (if multitenant)?
>>>>>    - Does it work with lower optimizer features enabled?
>>>>>    - As we are already manually looking into what LGWR and its slaves
>>>>> are doing, there's no need for the wait chains script here
>>>>>
>>>>> ***oracle has filed a bug. altering optimizer_features_enabled doesn'=
t
>>>> help.
>>>>
>>>>
>>>>> 6) Linux Process Snapper - pSnapper
>>>>>    - If you want to see the full "wait" states of Linux
>>>>> processes/threads, use the* -a *option.
>>>>>    - It's somewhat like always showing "idle" threads too
>>>>>    - But for linux apps it's harder to determine what's idle and
>>>>> what's some network RPC wait, so I currently just do what Linux "load=
"
>>>>> metric does
>>>>>    - I just shows threads in Runnable and Disk/Uninterruptible sleep
>>>>> states by default (-a shows all states, including Sleep and kernel-Id=
le
>>>>> threads etc)
>>>>>    - So the Linux pSnapper "idle" means a somewhat different thing
>>>>> than Oracle "idle" - say LGWR was waiting for a Data Guard ACK, that =
should
>>>>> not be shown as Idle in Oracle, but at the Linux process level it's "=
just
>>>>> some network socket wait" and you'd need to use *-a* option to show
>>>>> all of them
>>>>>
>>>>
>>>> *** will add that option to future collections (and watch the
>>>> video/read help)
>>>>
>>>>>
>>>>> 7) Were LGWR/LG* processes mostly idle during the problem time?
>>>>>    - As I kept reading the thread, it looks like ASH reported LGWR
>>>>> being mostly idle during the problem time
>>>>>    - This is not exactly the same pattern, but I've seen cases in the
>>>>> past where everyone's waiting for LGWR and LGWR doesn't seem to be bu=
sy and
>>>>> doesn't show any significant waits
>>>>>    - It turned out that LGWR was actually waiting for CKPT to make
>>>>> progress, but it wasn't synchronously waiting, but just going to slee=
p and
>>>>> "checking again later" in a loop
>>>>>    - I wrote about this here:
>>>>> https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incomple=
te-and-lgwr-waiting-for-checkpoint-progress/
>>>>>    - So it may still make sense to check what LGWR/LG* processes are
>>>>> doing even if they don't seem to be active much (like do you see an
>>>>> occasional controlfile read wait by LGWR etc)
>>>>>    - SQL*Trace would capture all activity of LGWR, but may slow thing=
s
>>>>> down further, so you could also sample & spool V$SESSION_WAIT_HISTORY=
 (that
>>>>> keeps a log of last 10 waits of every session in memory)
>>>>>
>>>>> *** i think i've addressed most of this.
>>>>
>>>>
>>>>> Incidentally I'm doing another hacking session tomorrow (Thursday 30
>>>>> Jan @ 12pm ET) about complex Oracle performance problems that span mu=
ltiple
>>>>> layers and have multiple moving parts. I can add some LGWR troublesho=
oting
>>>>> magic in the end of that (here goes my plan to fit the session in 1.5=
h
>>>>> again!). I'll post the details in my blog in a few minutes.
>>>>>
>>>>> *** can't wait!!
>>>>
>>>>> --
>>>>> Tanel Poder
>>>>> https://blog.tanelpoder.com/seminar
>>>>>
>>>>>
>>>>> On Tue, Jan 28, 2020 at 2:22 PM Chris Stephens <cstephens16@gmail.com=
>
>>>>> wrote:
>>>>>
>>>>>> doesn't look like any ash data for lgwr:
>>>>>>
>>>>>> SQL> @ashtop session_id,blocking_session,event2 1=3D1
>>>>>> "TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"
>>>>>>    TOTALSECONDS    AAS      %This    SESSION_ID    BLOCKING_SESSION
>>>>>>              EVENT2             FIRST_SEEN              LAST_SEEN
>>>>>>  DIST_SQLEXEC_SEEN
>>>>>> _______________ ______ __________ _____________ ___________________
>>>>>> _____________________ ______________________ ______________________
>>>>>> ____________________
>>>>>>              81    0.7   82% |             1713                1710
>>>>>> log file sync         2020-01-27 15:50:54    2020-01-27 15:52:14
>>>>>>             1
>>>>>>               5      0    5% |             1221
>>>>>> ON CPU                2020-01-27 15:50:59    2020-01-27 15:52:14
>>>>>>             1
>>>>>>               3      0    3% |              246
>>>>>> ON CPU                2020-01-27 15:51:00    2020-01-27 15:52:00
>>>>>>             1
>>>>>>               3      0    3% |             1713
>>>>>> ON CPU                2020-01-27 15:50:51    2020-01-27 15:52:27
>>>>>>             2
>>>>>>               1      0    1% |              123
>>>>>> ON CPU                2020-01-27 15:51:35    2020-01-27 15:51:35
>>>>>>             1
>>>>>>               1      0    1% |              127
>>>>>> ON CPU                2020-01-27 15:51:50    2020-01-27 15:51:50
>>>>>>             1
>>>>>>               1      0    1% |              252                2321
>>>>>> latch free            2020-01-27 15:52:28    2020-01-27 15:52:28
>>>>>>             1
>>>>>>               1      0    1% |              978
>>>>>> ges remote message    2020-01-27 15:50:48    2020-01-27 15:50:48
>>>>>>             1
>>>>>>               1      0    1% |              983
>>>>>> latch free            2020-01-27 15:52:28    2020-01-27 15:52:28
>>>>>>             1
>>>>>>               1      0    1% |             1713
>>>>>> library cache lock    2020-01-27 15:50:48    2020-01-27 15:50:48
>>>>>>             1
>>>>>>               1      0    1% |             1831
>>>>>> ON CPU                2020-01-27 15:50:49    2020-01-27 15:50:49
>>>>>>             1
>>>>>>
>>>>>>
>>>>>> 11 rows selected.
>>>>>>
>>>>>>
>>>>>> SQL> @ashtop session_id,blocking_session,event2 session_id=3D1710
>>>>>> "TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"
>>>>>>
>>>>>> no rows selected
>>>>>>
>>>>>> On Tue, Jan 28, 2020 at 1:09 PM Noveljic Nenad <
>>>>>> nenad.noveljic@vontobel.com> wrote:
>>>>>>
>>>>>>> You=E2=80=99re using v0.2, but  v0.6 should be the latest version. =
0.2
>>>>>>> wouldn=E2=80=99t show idle blockers.
>>>>>>>
>>>>>>> I looked at the ashtop output once again. Strange that lgwr
>>>>>>> processes nor any other bg processes show up significantly there. T=
he
>>>>>>> question is, what was lgwr doing while the fg processes were waitin=
g on him.
>>>>>>>
>>>>>>>
>

--0000000000007868c6059d998d60
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks for the detailed posts Fritz! Very helpful/enlighte=
ning.=C2=A0 I will check those lgwr stats if/when we can reproduce the issu=
e since all instances have been restarted since last occurrence.<div><br></=
div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" cl=
ass=3D"gmail_attr">On Sun, Feb 2, 2020 at 6:51 AM Frits Hoogland &lt;<a hre=
f=3D"mailto:frits.hoogland@gmail.com">frits.hoogland@gmail.com</a>&gt; wrot=
e:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style=3D=
"overflow-wrap: break-word;">I didn=E2=80=99t look that deep, I tried to ge=
t an idea of the situation and the current state.<div><br></div><div>This, =
as others have concluded already, then looks like the bug that seems to sur=
face every once in a while where sessions that use post-wait for waiting fo=
r the logwriter get =E2=80=9Cforgotten=E2=80=9D when the instance switches =
to polling.</div><div>The switching is visible, or at least was visible whe=
n I looked at it in versions 11.2 and 12.1.0.2, by looking at the logwriter=
 trace file, it mentions when the logwriter switches between post-wait and =
polling and back.</div><div>However, in these cases, the process would way =
longer, until the logwriter decided to switch back to post-wait and found t=
he waiting sessions again.</div><div><br></div><div>When this is a bug with=
 the above described issue, of course it needs to be fixed. However, there =
is a way around it:</div><div>- In post-wait mode, a process waiting in =E2=
=80=98log file sync=E2=80=99 can only end the wait if two conditions are me=
t: it needs to be posted and the on_disk_scn must be beyond the process=E2=
=80=99 commit SCN. Based on what is shown, I assume the logwriter happily g=
one on and had the on disk scn updated (it does that always every 3 seconds=
) probably beyond the process=E2=80=99 commit scn. So the most logical thin=
g the process is waiting for, is to be posted, which did not happen.</div><=
div>- In polling mode, a process will wait for a pre-computed time, during =
which it can not be notified, and then check for the on disk scn. The pre-c=
omputed time in normal circumstances is really low, probably around a milli=
second, but computed based on the latency of your system. The important bit=
 is there is no need for the logwriter to do anything: the waiting is based=
 on the progress that the logwriter makes by means of the on disk scn, but =
it doesn=E2=80=99t require additional work, like being posted.=C2=A0</div><=
div>- Therefore, if you fix your instances to use polling only, it can not =
come in the situation of a post-wait process being forgotten. This can be a=
chieved by setting =E2=80=9C_use_adaptive_log_file_sync=E2=80=9D to polling=
_only. This is an underscore parameter, and therefore you have to ask bless=
ing from oracle support to use it, but since you already got a SR open, thi=
s should be simple to do.</div><div><br></div><div>I can=E2=80=99t think of=
 any downsides of using polling_only. Maybe someone can?</div><div>Even the=
 use of it, since it=E2=80=99s switched to dynamically by all supported ver=
sions, is high.<br><div>
<div style=3D"color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-=
indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div st=
yle=3D"color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:=
0px;text-transform:none;white-space:normal;word-spacing:0px"><div style=3D"=
color:rgb(0,0,0);font-family:Helvetica;font-style:normal;font-variant:norma=
l;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-w=
ebkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spac=
ing:0px"><div style=3D"color:rgb(0,0,0);font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white=
-space:normal;word-spacing:0px"><div style=3D"color:rgb(0,0,0);font-family:=
Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-s=
pacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;te=
xt-transform:none;white-space:normal;word-spacing:0px"><span style=3D"borde=
r-collapse:separate;color:rgb(0,0,0);font-family:Helvetica;font-style:norma=
l;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:=
normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-sp=
ace:normal;word-spacing:0px;border-spacing:0px"><br>Frits Hoogland<br><br><=
a href=3D"http://fritshoogland.wordpress.com/" target=3D"_blank">http://fri=
tshoogland.wordpress.com</a><br><a href=3D"mailto:frits.hoogland@gmail.com"=
 target=3D"_blank">frits.hoogland@gmail.com</a><br><div><span style=3D"text=
-align:-webkit-auto">Mobile: +31 6 14180860</span></div></span></div></div>=
</div></div></div>
</div>
<div><br><blockquote type=3D"cite"><div>On 2 Feb 2020, at 04:50, Tanel Pode=
r &lt;<a href=3D"mailto:tanel@tanelpoder.com" target=3D"_blank">tanel@tanel=
poder.com</a>&gt; wrote:</div><br><div><div dir=3D"ltr"><div dir=3D"ltr"><d=
iv dir=3D"ltr"><div dir=3D"ltr">But your trace file here already indicates =
that there&#39;s some glitch regarding LGWR wakeups/posting:<div><br></div>=
<div><b>Foreground:</b></div><div><br></div><div><div>*** 2020-01-29T<b><fo=
nt color=3D"#0000ff">13:04:20</font></b>.724666-06:00<br>WAIT #0: nam=3D&#3=
9;log file sync&#39; ela=3D 6909711 buffer#=3D58996 sync scn=3D976089279 p3=
=3D0 obj#=3D-1 tim=3D1025228531958<br></div><div><br></div><div><b>LGWR at =
the same time:</b></div><div><br></div><div>*** 2020-01-29T<b><font color=
=3D"#0000ff">13:04:20</font></b>.725994-06:00<br>WAIT #0: nam=3D&#39;rdbms =
ipc message&#39; ela=3D 2918443 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=
=3D1025228533313</div></div><div><br></div><div>The WAIT lines are printed =
out when the wait ends (and the *** timestamp line just before the WAIT lin=
e is printed out, if enough time has passed since the last *** line). So th=
is shows that LGWR had happily been idle, waiting for its idle 3-second tim=
eout to expire, or someone to wake it up (that&#39;s what &quot;rdbms ipc m=
essage&quot; means) - while someone had issued a commit and was waiting for=
 LGWR acknowledgment. So this shows that LGWR didn&#39;t even get the post =
from your session, nor any other sessions (if you had concurrent transactio=
nal activity). Back in 11g, typically=C2=A0these problems happened when ada=
ptive log file syncing switch between poll and post/wait wakeup method happ=
ened and you hit a bug. But in 12c there are more moving parts.</div><div><=
br></div><div>--</div><div>Tanel Poder</div><div><a href=3D"https://blog.ta=
nelpoder.com/seminar" target=3D"_blank">https://blog.tanelpoder.com/seminar=
</a></div></div></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr"=
 class=3D"gmail_attr">On Sat, Feb 1, 2020 at 10:34 PM Tanel Poder &lt;<a hr=
ef=3D"mailto:tanel@tanelpoder.com" target=3D"_blank">tanel@tanelpoder.com</=
a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><d=
iv dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=
=3D"ltr">Hi Chris,<div><br></div><div>As 12c+ has log writer slaves and Ora=
cle can adaptavely switch=C2=A0from single LGWR to LGWR=C2=A0+ LGxx slave (=
_use_single_log_writer=3Dadaptive), can you check ASH data for the LGxx sla=
ves at this time too. Not all process-to-process interactions/waits are ins=
trumented with a blocking_session in Oracle...</div><div><br></div><div>Sho=
uld be this command:</div><div><br></div><div><div><font face=3D"monospace"=
>@ashtop username,event2,program2,blocking_session,blocking_inst_id &quot;p=
rogram like &#39;%(LG%)&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:04:00&=
#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;</font></div><=
/div><div><br></div><div>--<br></div><div>Tanel Poder</div><div><a href=3D"=
https://blog.tanelpoder.com/seminar" target=3D"_blank">https://blog.tanelpo=
der.com/seminar</a></div><div><br></div></div></div></div></div><br><div cl=
ass=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jan 30, 2=
020 at 3:44 PM Chris Stephens &lt;<a href=3D"mailto:cstephens16@gmail.com" =
target=3D"_blank">cstephens16@gmail.com</a>&gt; wrote:<br></div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><font face=3D"monos=
pace">Here&#39;s some additional ash output from the trace data posted in l=
ast message including event_hist.sql:</font><div><font face=3D"monospace"><=
br></font></div><div><font face=3D"monospace">SQL&gt; @ashtop username,even=
t2,program2,blocking_session,blocking_inst_id 1=3D1 &quot;TIMESTAMP&#39;202=
0-01-29 13:04:20&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:04:27&#39;&qu=
ot;<br>=C2=A0 =C2=A0TOTALSECONDS =C2=A0 =C2=A0AAS =C2=A0 =C2=A0 =C2=A0%This=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 USERNAME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 EV=
ENT2 =C2=A0 =C2=A0 =C2=A0PROGRAM2 =C2=A0 =C2=A0BLOCKING_SESSION =C2=A0 =C2=
=A0BLOCKING_INST_ID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FIRST_SEEN =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LAST_SEEN =C2=A0 =C2=A0DIST=
_SQLEXEC_SEEN<br>_______________ ______ __________ ________________ _______=
_________ _____________ ___________________ ___________________ ___________=
___________ ______________________ ____________________<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2 =C2=A0 =C2=A00.3 =C2=A0 50% | =C2=A0 =
=C2=A0SYS =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ON CPU =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 (sqlplus) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2020-01-29 13:04:24 =C2=A0 =C2=A0=
2020-01-29 13:04:25 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 1 =C2=A0 =C2=A00.1 =C2=A0 25% | =C2=A0 =C2=A0GEN3_MGOWER_3 =C2=A0 =C2=A0ON=
 CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (python) =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-29 13=
:04:22 =C2=A0 =C2=A02020-01-29 13:04:22 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A00.1 =C2=A0 25% | =C2=A0 =C2=A0GEN3_MGOW=
ER_3 =C2=A0 =C2=A0log file sync =C2=A0 =C2=A0(python) =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1710 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2 2020-01-29 13:04:20 =C2=A0 =C2=
=A02020-01-29 13:04:20 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 1<br><br><br>SQL&gt; @bg lgwr<br>=C2=A0 =C2=A0NAME=
 =C2=A0 =C2=A0DESCRIPTION =C2=A0 =C2=A0 SID =C2=A0 =C2=A0OPID =C2=A0 =C2=A0=
 SPID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PADDR =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SADDR<br>_______ ______________ _______ =
_______ ________ ___________________ ___________________<br>LGWR =C2=A0 =C2=
=A0Redo etc. =C2=A0 =C2=A0 =C2=A0 =C2=A0 1710 =C2=A0 =C2=A0 =C2=A034 26552 =
=C2=A0 =C2=A000000001E8718860 =C2=A0 =C2=A000000001D8BFF4A0<br></font></div=
><div><font face=3D"monospace"><br></font></div><div><font face=3D"monospac=
e">SQL&gt; @ashtop username,event2,program2,blocking_session,blocking_inst_=
id session_id=3D1710 &quot;TIMESTAMP&#39;2020-01-29 13:04:20&#39;&quot; &qu=
ot;TIMESTAMP&#39;2020-01-29 13:04:27&#39;&quot;<br><br>no rows selected<br>=
<br>SQL&gt; @ashtop username,event2,program2,blocking_session,blocking_inst=
_id session_id=3D1710 &quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &q=
uot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;<br><br>no rows selected<br=
></font></div><div><font face=3D"monospace"><br></font></div><div><font fac=
e=3D"monospace">SQL&gt; @ash/event_hist &quot;log file&quot; 1=3D1 =C2=A0&q=
uot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &quot;TIMESTAMP&#39;2020-0=
1-29 13:05:00&#39;&quot;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0EVH_EVENT =C2=A0 =C2=
=A0 =C2=A0 EVH_MILLISEC =C2=A0 =C2=A0EVH_SAMPLE_COUNT =C2=A0 =C2=A0EVH_EST_=
EVENT_COUNT =C2=A0 =C2=A0EVH_EST_TOTAL_SEC =C2=A0 =C2=A0PCT_EVT_TIME =C2=A0=
 =C2=A0 =C2=A0 EVH_GRAPH =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FIRST_SE=
EN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LAST_SEEN<br>___________=
_____ __________________ ___________________ ______________________ _______=
_____________ _______________ _______________ ______________________ ______=
________________<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 &lt; 1024 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A01.8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A01.8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A012.8 |# =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 | =C2=A0 =C2=A02020-01-29 13:04:13 =C2=A0 =C2=A02020-01-29 13:04=
:13<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &lt; 4096 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04.1 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A029.1 |### =C2=A0 =C2=A0 =C2=A0 | =
=C2=A0 =C2=A02020-01-29 13:04:12 =C2=A0 =C2=A02020-01-29 13:04:12<br>log fi=
le sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &lt; 8192 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A08.2 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A058.2 |###### =C2=A0 =C2=A0| =C2=A0 =C2=A02020-01=
-29 13:04:20 =C2=A0 =C2=A02020-01-29 13:04:20<br></font></div><div><font fa=
ce=3D"monospace"><br></font></div><div><font face=3D"monospace">SQL&gt; @as=
h/event_hist &quot;write&quot; 1=3D1 =C2=A0&quot;TIMESTAMP&#39;2020-01-29 1=
3:04:00&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;<br><b=
r>no rows selected<br><br>SQL&gt; @ash/event_hist &quot;*&quot; 1=3D1 =C2=
=A0&quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &quot;TIMESTAMP&#39;2=
020-01-29 13:05:00&#39;&quot;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 EVH_EVENT =C2=
=A0 =C2=A0 =C2=A0 EVH_MILLISEC =C2=A0 =C2=A0EVH_SAMPLE_COUNT =C2=A0 =C2=A0E=
VH_EST_EVENT_COUNT =C2=A0 =C2=A0EVH_EST_TOTAL_SEC =C2=A0 =C2=A0PCT_EVT_TIME=
 =C2=A0 =C2=A0 =C2=A0 EVH_GRAPH =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 F=
IRST_SEEN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LAST_SEEN<br>____=
_____________ __________________ ___________________ ______________________=
 ____________________ _______________ _______________ _____________________=
_ ______________________<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0&lt; 1024 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A01.8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A01.8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A012.8 |# =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A02020-01-29 13:04:13 =C2=A0 =C2=A020=
20-01-29 13:04:13<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0&lt; 4096 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A04.1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A029.1 |### =C2=A0 =
=C2=A0 =C2=A0 | =C2=A0 =C2=A02020-01-29 13:04:12 =C2=A0 =C2=A02020-01-29 13=
:04:12<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt=
; 8192 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A08.2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A058.2 |###### =C2=A0 =C2=A0|=
 =C2=A0 =C2=A02020-01-29 13:04:20 =C2=A0 =C2=A02020-01-29 13:04:20<br>row c=
ache lock =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt; 1 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2469.1 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.5 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 100 |##########| =C2=A0 =C2=A02020-01-29 13:04:01 =
=C2=A0 =C2=A02020-01-29 13:04:01<br></font></div><div><font face=3D"monospa=
ce"><br></font></div><div><font face=3D"monospace">SQL&gt; @ashtop username=
,event2,program2,blocking_session,blocking_inst_id &quot;lower(program) lik=
e &#39;%lg%&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &=
quot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;<br><br>no rows selected<b=
r></font></div><div><font face=3D"monospace"><br></font></div><div><font fa=
ce=3D"monospace">i&#39;m going to copy over gv$active_session_history to a =
single instance database and take a look at ash_wait_chains but i don&#39;t=
 expect that to reveal anything either.</font></div><div><font face=3D"mono=
space"><br></font></div><div><font face=3D"monospace">i need to confirm whe=
ther adding redo resolved issue (will have developer fire up workload) but =
I think next step is to investigate x$trace and possible process communicat=
ion issues?</font></div></div><br><div class=3D"gmail_quote"><div dir=3D"lt=
r" class=3D"gmail_attr">On Thu, Jan 30, 2020 at 10:27 AM Chris Stephens &lt=
;<a href=3D"mailto:cstephens16@gmail.com" target=3D"_blank">cstephens16@gma=
il.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex"><div dir=3D"ltr"><div dir=3D"ltr">Hey=C2=A0Tanel,<div><br></div><div>=
Thanks for taking time to write this up! Will definitely=C2=A0be attending =
that webinar.</div><div><br></div><div>comments inline</div></div><br><div =
class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Jan 29,=
 2020 at 10:32 PM Tanel Poder &lt;<a href=3D"mailto:tanel@tanelpoder.com" t=
arget=3D"_blank">tanel@tanelpoder.com</a>&gt; wrote:<br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px soli=
d rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div=
 dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D=
"ltr"><div dir=3D"ltr"><div>Comments:</div><div><br></div><div>1) First, lo=
oks like you&#39;re using all the right tools for systematic troubleshootin=
g ;-)</div></div></div></div></div></div></div></div></div></blockquote><di=
v><br></div><div>***i try!=C2=A0</div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddi=
ng-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D=
"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><=
div><br></div><div>2) As &quot;log file sync&quot; is not directly an I/O w=
ait event (IO happens indirectly under the hood), but &quot;wait for LGWR t=
o ACK/increment redo persisting position&quot; then there may be multiple r=
easons for delays. I&#39;m repeating some of what others have said here too=
:</div><div>=C2=A0 - LGWR could be waiting for remote ACK (synchronous data=
 guard) - that should show up as a LNS* wait or something like that though<=
/div><div>=C2=A0 - Process slowness due to CPU overload/scheduling latency=
=C2=A0or swapping (in both cases, LGWR may do its work fast, but it takes a=
 while for the foreground process to wake up and end the wait event - you c=
ould check if there are any reports regarding switching LGWR mode in alert.=
log around that time)</div><div>=C2=A0 - Bug/missed post due to switching b=
etween LGWR progress polling vs. waiting for LGWR post</div><div>=C2=A0 - A=
SM instance communication issues (ASM communication can be on the critical =
path when creating/changing files on ASM and even for reading, when the ASM=
 metadata cache in your local instance doesn&#39;t have what it needs)</div=
><div>=C2=A0 - So, the physical redo log file disk write is not necessarily=
 the biggest contributor to the log file sync wait</div><div><br></div></di=
v></div></div></div></div></div></div></div></blockquote><div>*** no data g=
uard in the env.</div><div>*** we have excess CPU. (20 cores / instance and=
 usually only 1-5 active foreground processes but i will confirm from TFA c=
ollections)</div><div>*** yesterday i was ready to blame bug. today, not so=
 sure. i checked log file switches with a script google found me a few days=
 ago. now thinking that was inaccurate as a script from metastink shows ton=
es of switching so that could very well be related.</div><div>***asm commun=
ication is something i need to look into. TFA has reported some ASM issues =
but there&#39;s no details and i never saw anything in asm alert logs indic=
ating an issue.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddi=
ng-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D=
"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><=
div></div><div>3) You&#39;ll need to measure what LGWR itself was doing dur=
ing the problem time</div><div>=C2=A0 - Since you&#39;re on 19c (12c or lat=
er) the LGWR has slaves too and they may be doing most of the work=C2=A0</d=
iv><div>=C2=A0 - You&#39;ll need to look into both the LGWR process and any=
 other LG* ones too</div><div>=C2=A0 - Since you have (Oracle) Snapper runn=
ing already and writing to trace you should have the data already</div><div=
 dir=3D"ltr">=C2=A0 - The <b>lgwr</b>=C2=A0in the following <font face=3D"m=
onospace">@snapper ash,stats,trace 10 999999 lgwr</font> syntax includes LG=
WR slaves too<br></div><div dir=3D"ltr">=C2=A0 - As LGWR slaves have their =
own sessions (and SIDs), you should run ashtop etc and report if there&#39;=
s any=C2=A0BLOCKING_SESSION value listed for LGWR and slaves<br></div><div>=
=C2=A0 - If LGWR and the slaves were listed <i>idle</i> during the problem =
time (while others waited for log file sync event at the same time) <i>and =
if=C2=A0</i>there was no huge OS level CPU/memory shortage at the time, thi=
s looks like a background &lt;-&gt; foreground communication problem.=C2=A0=
</div><div>=C2=A0 =C2=A0 =C2=A0- This (and any potential ASM issues) could =
probably be troubleshooted via KST tracing, KSR channels and X$TRACE that c=
ould help here (some of it is enabled by default ( Like I showed in a previ=
ous hacking session: <a href=3D"https://www.youtube.com/watch?v=3DmkmvZv58W=
6w" target=3D"_blank">https://www.youtube.com/watch?v=3DmkmvZv58W6w</a> )</=
div><div><br></div></div></div></div></div></div></div></div></div></blockq=
uote><div><br></div><div>here is all the output i have corresponding to a 6=
 sec &quot;log file sync&quot; wait:</div><div><br></div><div><table style=
=3D"vertical-align:baseline;font-size:12px;line-height:13.5pt;font-family:L=
ato,verdana,&quot;bitstream vera sans&quot;,&quot;dejavu sans&quot;,&quot;l=
iberation sans&quot;,geneva,sans-serif;margin:2ex 0px 2ex 40px;border-colla=
pse:collapse;border:0px"><tbody style=3D"vertical-align:baseline"><tr style=
=3D"vertical-align:baseline"><td style=3D"vertical-align:baseline;padding:0=
.5ex 0px 0.5ex 1em;text-align:right;font-size:10.2px;border:0px;white-space=
:nowrap">Experience duration:</td><td style=3D"vertical-align:baseline;padd=
ing:0.5ex 1em;width:100em"><em style=3D"vertical-align:baseline">R</em>=C2=
=A0=3D=C2=A0<em style=3D"vertical-align:baseline">t</em><span style=3D"vert=
ical-align:0px;font-size:7.2px">1</span>=C2=A0=E2=88=92=C2=A0<em style=3D"v=
ertical-align:baseline">t</em><span style=3D"vertical-align:0px;font-size:7=
.2px">0</span>=C2=A0=3D=C2=A0<span title=3D"This report is relevant only if=
 the end-user experience you are analyzing has consumed the duration shown =
here. If this duration does not correspond to your actual end-user performa=
nce experience, then you should repair your trace data or collect trace dat=
a more carefully for another experience. Time your data collection activati=
on and deactivation to match precisely the begin and end times of the exper=
ience you are analyzing." style=3D"vertical-align:baseline;color:tomato;fon=
t-weight:bold"><a style=3D"vertical-align:baseline;text-decoration-line:non=
e;color:tomato">41.336137=C2=A0seconds</a></span></td></tr><tr style=3D"ver=
tical-align:baseline"><td style=3D"vertical-align:baseline;padding:0.5ex 0p=
x 0.5ex 1em;text-align:right;font-size:10.2px;border:0px;white-space:nowrap=
">Experience begin time:</td><td style=3D"vertical-align:baseline;padding:0=
.5ex 1em;width:100em"><em style=3D"vertical-align:baseline">t</em><span sty=
le=3D"vertical-align:0px;font-size:7.2px">0</span>=C2=A0=3D 2020-01-29T13:0=
4:07.560478</td></tr><tr style=3D"vertical-align:baseline"><td style=3D"ver=
tical-align:baseline;padding:0.5ex 0px 0.5ex 1em;text-align:right;font-size=
:10.2px;border:0px;white-space:nowrap">Experience end time:</td><td style=
=3D"vertical-align:baseline;padding:0.5ex 1em;width:100em"><em style=3D"ver=
tical-align:baseline">t</em><span style=3D"vertical-align:0px;font-size:7.2=
px">1</span>=C2=A0=3D 2020-01-29T13:04:48.896615</td></tr></tbody></table><=
/div>CALL-NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0DURATION =C2=A0 =C2=A0 =C2=A0 % =C2=A0 CALLS =C2=A0=
 =C2=A0 =C2=A0MEAN =C2=A0 =C2=A0 =C2=A0 MIN =C2=A0 =C2=A0 =C2=A0 MAX<br>---=
--------------------------- =C2=A0--------- =C2=A0------ =C2=A0------ =C2=
=A0-------- =C2=A0-------- =C2=A0--------<br>SQL*Net message from client =
=C2=A0 =C2=A0 26.017105 =C2=A0 66.5% =C2=A0 9,886 =C2=A00.002632 =C2=A00.00=
0209 =C2=A07.858051<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 10.582409 =C2=A0 27.1% =C2=A0 =C2=A0 150 =C2=A00.0=
70549 =C2=A00.000295 =C2=A06.909711<br>EXEC =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.088757=
 =C2=A0 =C2=A05.3% =C2=A0 9,702 =C2=A00.000215 =C2=A00.000000 =C2=A00.20919=
9<br>DLM cross inst call completion =C2=A0 0.109617 =C2=A0 =C2=A00.3% =C2=
=A0 =C2=A0 212 =C2=A00.000517 =C2=A00.000005 =C2=A00.080575<br>row cache lo=
ck =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0.070118 =
=C2=A0 =C2=A00.2% =C2=A0 =C2=A0 125 =C2=A00.000561 =C2=A00.000208 =C2=A00.0=
00972<br>FETCH =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.047586 =C2=A0 =C2=A00.1% =C2=A0 1,0=
29 =C2=A00.000046 =C2=A00.000000 =C2=A00.003056<br>PARSE =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A00.045293 =C2=A0 =C2=A00.1% =C2=A0 4,778 =C2=A00.000009 =C2=A00.000000=
 =C2=A00.002692<br>CLOSE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.033453 =C2=A0 =C2=A00.1% =
=C2=A0 4,784 =C2=A00.000007 =C2=A00.000000 =C2=A00.000237<br>library cache =
lock =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0.023614 =C2=A0 =C2=
=A00.1% =C2=A0 =C2=A0 =C2=A055 =C2=A00.000429 =C2=A00.000152 =C2=A00.000901=
<br>library cache pin =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A00.022311 =C2=A0 =C2=A00.1% =C2=A0 =C2=A0 =C2=A055 =C2=A00.000406 =C2=A00=
.000153 =C2=A00.000689<br>17 others =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.071430 =C2=A0 =C2=A00.2% =C2=
=A010,772 =C2=A00.000007 =C2=A00.000000 =C2=A00.001487<br>-----------------=
------------- =C2=A0--------- =C2=A0------ =C2=A0------ =C2=A0-------- =C2=
=A0-------- =C2=A0--------<br><div>TOTAL (27) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A039.111693 =C2=A0100.0% =C2=
=A041,548 =C2=A00.000941 =C2=A00.000000 =C2=A07.858051</div><div><br></div>=
<div>6 sec wait from raw data:</div><div><br></div><div>*** 2020-01-29T13:0=
4:20.724666-06:00<br>WAIT #0: nam=3D&#39;log file sync&#39; ela=3D 6909711 =
buffer#=3D58996 sync scn=3D976089279 p3=3D0 obj#=3D-1 tim=3D1025228531958<b=
r>WAIT #0: nam=3D&#39;SQL*Net message to client&#39; ela=3D 2 driver id=3D1=
413697536 #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228532118<br>WAIT #0: nam=
=3D&#39;SQL*Net message from client&#39; ela=3D 1711 driver id=3D1413697536=
 #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228533852<br>BINDS #140204587734864:=
<br></div><div><br></div><div>i enabled 10046 on lgwr:</div><div><br></div>=
<div>*** 2020-01-29T13:04:20.725994-06:00<br>WAIT #0: nam=3D&#39;rdbms ipc =
message&#39; ela=3D 2918443 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D102=
5228533313<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 279 f=
iles=3D1 blocks=3D17 requests=3D2 obj#=3D-1 tim=3D1025228533777<br>WAIT #0:=
 nam=3D&#39;log file parallel write&#39; ela=3D 171 files=3D1 blocks=3D3 re=
quests=3D2 obj#=3D-1 tim=3D1025228534104<br>WAIT #0: nam=3D&#39;LGWR wait f=
or redo copy&#39; ela=3D 22 copy latch #=3D28 p2=3D0 p3=3D0 obj#=3D-1 tim=
=3D1025228534182<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D=
 183 files=3D1 blocks=3D2 requests=3D2 obj#=3D-1 tim=3D1025228534437<br>WAI=
T #0: nam=3D&#39;log file parallel write&#39; ela=3D 174 files=3D1 blocks=
=3D2 requests=3D2 obj#=3D-1 tim=3D1025228534721<br>WAIT #0: nam=3D&#39;rdbm=
s ipc message&#39; ela=3D 31 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025=
228534786<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 3233 timeout=
=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228538038<br>WAIT #0: nam=3D&#39;log=
 file parallel write&#39; ela=3D 170 files=3D1 blocks=3D2 requests=3D1 obj#=
=3D-1 tim=3D1025228538297<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=
=3D 25 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228538358<br>WAIT #0: =
nam=3D&#39;rdbms ipc message&#39; ela=3D 11829 timeout=3D8 p2=3D0 p3=3D0 ob=
j#=3D-1 tim=3D1025228550204<br>WAIT #0: nam=3D&#39;LGWR wait for redo copy&=
#39; ela=3D 52 copy latch #=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228550294=
<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 239 files=3D1 b=
locks=3D7 requests=3D1 obj#=3D-1 tim=3D1025228550624<br>WAIT #0: nam=3D&#39=
;rdbms ipc message&#39; ela=3D 23 timeout=3D6 p2=3D0 p3=3D0 obj#=3D-1 tim=
=3D1025228550699<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 16750=
 timeout=3D6 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228567467<br>WAIT #0: nam=3D=
&#39;log file parallel write&#39; ela=3D 207 files=3D1 blocks=3D9 requests=
=3D1 obj#=3D-1 tim=3D1025228567762<br>WAIT #0: nam=3D&#39;rdbms ipc message=
&#39; ela=3D 26 timeout=3D5 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228567825<br>=
WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 16249 timeout=3D5 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228584091<br>WAIT #0: nam=3D&#39;log file parall=
el write&#39; ela=3D 213 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D=
1025228584400<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 48 timeo=
ut=3D3 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228584502<br>WAIT #0: nam=3D&#39;r=
dbms ipc message&#39; ela=3D 16466 timeout=3D3 p2=3D0 p3=3D0 obj#=3D-1 tim=
=3D1025228600993<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D=
 228 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228601393<br>WAI=
T #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 29 timeout=3D1 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228601482<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39=
; ela=3D 9548 timeout=3D1 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228611053<br>WA=
IT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 7467 timeout=3D300 p2=3D0 p=
3=3D0 obj#=3D-1 tim=3D1025228618574<br>WAIT #0: nam=3D&#39;LGWR wait for re=
do copy&#39; ela=3D 13 copy latch #=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10252=
28618636<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 222 fil=
es=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228618960<br>WAIT #0: na=
m=3D&#39;rdbms ipc message&#39; ela=3D 31 timeout=3D300 p2=3D0 p3=3D0 obj#=
=3D-1 tim=3D1025228619066<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=
=3D 15946 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228635045<br>WAIT=
 #0: nam=3D&#39;log file parallel write&#39; ela=3D 240 files=3D1 blocks=3D=
9 requests=3D1 obj#=3D-1 tim=3D1025228635427<br>WAIT #0: nam=3D&#39;rdbms i=
pc message&#39; ela=3D 33 timeout=3D298 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10252=
28635525<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 15987 timeout=
=3D298 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228651537<br>WAIT #0: nam=3D&#39;l=
og file parallel write&#39; ela=3D 255 files=3D1 blocks=3D9 requests=3D1 ob=
j#=3D-1 tim=3D1025228651912<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; e=
la=3D 51 timeout=3D296 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228652034<br>WAIT =
#0: nam=3D&#39;rdbms ipc message&#39; ela=3D 16332 timeout=3D296 p2=3D0 p3=
=3D0 obj#=3D-1 tim=3D1025228668405<br>WAIT #0: nam=3D&#39;log file parallel=
 write&#39; ela=3D 256 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D10=
25228668804<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 37 timeout=
=3D295 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228668911<br>WAIT #0: nam=3D&#39;r=
dbms ipc message&#39; ela=3D 16969 timeout=3D295 p2=3D0 p3=3D0 obj#=3D-1 ti=
m=3D1025228685907<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=
=3D 273 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228686327<br>=
WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 38 timeout=3D293 p2=3D0 p=
3=3D0 obj#=3D-1 tim=3D1025228686453<br>WAIT #0: nam=3D&#39;rdbms ipc messag=
e&#39; ela=3D 6908 timeout=3D293 p2=3D0 p3=3D0 obj#=3D-1 tim=3D102522869339=
0<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 247 files=3D1 =
blocks=3D2 requests=3D1 obj#=3D-1 tim=3D1025228693768<br>WAIT #0: nam=3D&#3=
9;rdbms ipc message&#39; ela=3D 38 timeout=3D292 p2=3D0 p3=3D0 obj#=3D-1 ti=
m=3D1025228693862j<br></div><div>=C2=A0...more of the same</div><div>264396=
 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 32 timeout=3D52 p2=3D0 p=
3=3D0 obj#=3D-1 tim=3D1025231090742<br>=C2=A0264397 WAIT #0: nam=3D&#39;rdb=
ms ipc message&#39; ela=3D 19460 timeout=3D52 p2=3D0 p3=3D0 obj#=3D-1 tim=
=3D1025231110227<br>=C2=A0264398 WAIT #0: nam=3D&#39;log file parallel writ=
e&#39; ela=3D 272 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231=
110633<br>=C2=A0264399 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 37=
 timeout=3D50 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231110748<br>=C2=A0264400 W=
AIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 15484 timeout=3D50 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025231126261<br>=C2=A0264401 WAIT #0: nam=3D&#39;lo=
g file parallel write&#39; ela=3D 247 files=3D1 blocks=3D4 requests=3D1 obj=
#=3D-1 tim=3D1025231126641<br>=C2=A0264402 WAIT #0: nam=3D&#39;rdbms ipc me=
ssage&#39; ela=3D 36 timeout=3D49 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10252311267=
53<br>=C2=A0264403 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 16732 =
timeout=3D49 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231143514<br>=C2=A0264404 WA=
IT #0: nam=3D&#39;LGWR wait for redo copy&#39; ela=3D 31 copy latch #=3D8 p=
2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231143603<br>=C2=A0264405 WAIT #0: nam=3D&=
#39;log file parallel write&#39; ela=3D 279 files=3D1 blocks=3D9 requests=
=3D1 obj#=3D-1 tim=3D1025231143976<br>=C2=A0264406 WAIT #0: nam=3D&#39;rdbm=
s ipc message&#39; ela=3D 35 timeout=3D47 p2=3D0 p3=3D0 obj#=3D-1 tim=3D102=
5231144093<br>=C2=A0264407 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=
=3D 5384 timeout=3D47 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231149515<br>=C2=A0=
264408 WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 255 files=3D=
1 blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231149889<br>=C2=A0264409 WAI=
T #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 32 timeout=3D47 p2=3D0 p3=3D=
0 obj#=3D-1 tim=3D1025231149989<br>=C2=A0264410<br>=C2=A0264411 *** 2020-01=
-29T13:04:23.802774-06:00<br>=C2=A0264412 WAIT #0: nam=3D&#39;rdbms ipc mes=
sage&#39; ela=3D 460056 timeout=3D46 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10252316=
10086<br>=C2=A0264413<br>=C2=A0264414 *** 2020-01-29T13:04:26.797925-06:00<=
br>=C2=A0264415 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 2995027 t=
imeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025234605238<br>=C2=A0264416 WA=
IT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 10649 timeout=3D1 p2=3D0 p3=
=3D0 obj#=3D-1 tim=3D1025234616054<br>=C2=A0264417<br></div><div><br></div>=
<div>i will get you snapper lgwr output associated w/ this interval as soon=
 as i can find it (can&#39;t remember where i stashed it and lost trace fil=
e name .. fml)</div><div><br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"l=
tr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><di=
v></div><div>4) Paste the full <b>Snapper</b> output (of one time snapshot =
from the problem time) here if you can - we&#39;ll be able to see what the =
slaves were doing</div><div>=C2=A0 - You&#39;d need to include not only LGW=
R SID but also any LG*=C2=A0slave SIDs too</div><div>=C2=A0 - Also we&#39;l=
l see better LGWR delay metrics from V$SESSTAT that go beyond just wait eve=
nts</div><div>=C2=A0 - Thinking of metrics like these - snapper already han=
dles them (if you look into the rightmost column)</div><div><div>=C2=A0 =C2=
=A0 =C2=A0- <b>redo synch time overhead (usec)</b> - &quot;FG wakeup overhe=
ad per log file sync&quot;</div><div>=C2=A0 =C2=A0 =C2=A0- <b>redo write ti=
me</b> - accounting end to end redo write time (not just the log file paral=
lel write syscall) &quot;per redo write&quot;</div></div><div><br></div><di=
v>5) Regarding the <b>ASH wait chains</b> error on 19c - I&#39;m aware of t=
his but forgot to look into it</div><div>=C2=A0 =C2=A0- It has (probably) s=
omething to do with GV$ views (or PX slaves used for PDB V$ queries) and po=
ssibly the connect by I&#39;m using</div><div>=C2=A0 =C2=A0- Does it work i=
n the CDB (if multitenant)?</div><div>=C2=A0 =C2=A0- Does it work with lowe=
r optimizer features enabled?</div><div>=C2=A0 =C2=A0- As we are already ma=
nually looking into what LGWR and its slaves are doing, there&#39;s no need=
 for the wait chains script here</div><div><br></div></div></div></div></di=
v></div></div></div></div></blockquote><div>***oracle has filed a bug. alte=
ring optimizer_features_enabled doesn&#39;t help.</div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D=
"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><=
div dir=3D"ltr"><div dir=3D"ltr"><div></div><div>6) Linux Process Snapper -=
 pSnapper</div><div>=C2=A0 =C2=A0- If you want to see the full &quot;wait&q=
uot; states of Linux processes/threads, use the<b> -a </b>option.</div><div=
>=C2=A0 =C2=A0- It&#39;s somewhat like always showing &quot;idle&quot; thre=
ads too</div><div>=C2=A0 =C2=A0- But for linux apps it&#39;s harder to dete=
rmine what&#39;s idle and what&#39;s some network RPC wait, so I currently =
just do what Linux &quot;load&quot; metric does</div><div>=C2=A0 =C2=A0- I =
just shows threads in Runnable and Disk/Uninterruptible sleep states by def=
ault (-a shows=C2=A0all states, including Sleep and kernel-Idle threads etc=
)</div><div>=C2=A0 =C2=A0- So the Linux pSnapper &quot;idle&quot; means a s=
omewhat different thing than Oracle &quot;idle&quot; - say LGWR was waiting=
 for a Data Guard ACK, that should not be shown as Idle in Oracle, but at t=
he Linux process level it&#39;s &quot;just some network socket wait&quot; a=
nd you&#39;d need to use <b>-a</b>=C2=A0option to show all of them</div></d=
iv></div></div></div></div></div></div></div></blockquote><div><br></div><d=
iv>*** will add that option to future collections (and watch the video/read=
 help)=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div di=
r=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"lt=
r"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div><br></div><div>7=
) Were LGWR/LG* processes mostly idle during the problem time?</div><div>=
=C2=A0 =C2=A0- As I kept reading the thread, it looks like ASH reported LGW=
R being mostly idle during the problem time</div><div>=C2=A0 =C2=A0- This i=
s not exactly the same pattern, but I&#39;ve seen cases in the past where e=
veryone&#39;s waiting for LGWR and LGWR doesn&#39;t seem to be busy and doe=
sn&#39;t show any significant waits</div><div>=C2=A0 =C2=A0- It turned out =
that LGWR was actually waiting for CKPT to make progress, but it wasn&#39;t=
 synchronously waiting, but just going to sleep and &quot;checking again la=
ter&quot; in a loop</div><div>=C2=A0 =C2=A0- I wrote about this here: <a hr=
ef=3D"https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incomple=
te-and-lgwr-waiting-for-checkpoint-progress/" target=3D"_blank">https://blo=
g.tanelpoder.com/posts/log-file-switch-checkpoint-incomplete-and-lgwr-waiti=
ng-for-checkpoint-progress/</a></div><div>=C2=A0 =C2=A0- So it may still ma=
ke sense to check what LGWR/LG* processes are doing even if they don&#39;t =
seem to be active much (like do you see an occasional controlfile read wait=
 by LGWR etc)</div><div>=C2=A0 =C2=A0- SQL*Trace would capture all activity=
 of LGWR, but may slow things down further, so you could also sample &amp; =
spool V$SESSION_WAIT_HISTORY (that keeps a log of last 10 waits of every se=
ssion in memory)</div><div><br></div></div></div></div></div></div></div></=
div></div></blockquote><div>*** i think i&#39;ve addressed most of this.</d=
iv><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0=
px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div =
dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"=
ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div></div><div>Inc=
identally I&#39;m doing another hacking session tomorrow (Thursday 30 Jan @=
 12pm ET) about complex Oracle performance problems that span multiple laye=
rs and have multiple moving parts. I can add some LGWR troubleshooting magi=
c in the end of that (here goes my plan to fit the session in 1.5h again!).=
 I&#39;ll post the details in my blog in a few minutes.</div><div><br></div=
></div></div></div></div></div></div></div></div></blockquote><div>*** can&=
#39;t wait!!=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><=
div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=
=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div></div><div=
>--</div><div>Tanel Poder</div><div><a href=3D"https://blog.tanelpoder.com/=
seminar" target=3D"_blank">https://blog.tanelpoder.com/seminar</a></div><di=
v><br></div></div></div></div></div></div></div><br><div class=3D"gmail_quo=
te"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jan 28, 2020 at 2:22 PM C=
hris Stephens &lt;<a href=3D"mailto:cstephens16@gmail.com" target=3D"_blank=
">cstephens16@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,=
204);padding-left:1ex"><div dir=3D"ltr"><div><font face=3D"monospace">doesn=
&#39;t look like any ash data for lgwr:</font></div><div><font face=3D"mono=
space"><br></font></div><font face=3D"monospace">SQL&gt; @ashtop session_id=
,blocking_session,event2 1=3D1 &quot;TIMESTAMP&#39;2020-01-27 15:50:48&#39;=
&quot; &quot;TIMESTAMP&#39;2020-01-27 15:52:40&#39;&quot;<br>=C2=A0 =C2=A0T=
OTALSECONDS =C2=A0 =C2=A0AAS =C2=A0 =C2=A0 =C2=A0%This =C2=A0 =C2=A0SESSION=
_ID =C2=A0 =C2=A0BLOCKING_SESSION =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0EVENT2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FIRST_SEEN =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LAST_SEEN =C2=A0 =C2=A0DIST=
_SQLEXEC_SEEN<br>_______________ ______ __________ _____________ __________=
_________ _____________________ ______________________ ____________________=
__ ____________________<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
81 =C2=A0 =C2=A00.7 =C2=A0 82% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
1713 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01710 log file s=
ync =C2=A0 =C2=A0 =C2=A0 =C2=A0 2020-01-27 15:50:54 =C2=A0 =C2=A02020-01-27=
 15:52:14 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 5 =C2=
=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A05% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 1221 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-2=
7 15:50:59 =C2=A0 =C2=A02020-01-27 15:52:14 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A03% | =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0246 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15:51:00 =C2=A0 =C2=A02020-01-27 =
15:52:00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =
=C2=A0 =C2=A00 =C2=A0 =C2=A03% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
1713 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15=
:50:51 =C2=A0 =C2=A02020-01-27 15:52:27 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0123 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15:51:35 =C2=A0 =C2=A02020-01-27 15:5=
1:35 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=
=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0127 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-2=
7 15:51:50 =C2=A0 =C2=A02020-01-27 15:51:50 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0252 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A02321 latch free =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A02020-01-27 15:52:28 =C2=A0 =C2=A02020-01-27 15:52:28 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=
=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0978 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ges remote m=
essage =C2=A0 =C2=A02020-01-27 15:50:48 =C2=A0 =C2=A02020-01-27 15:50:48 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =
=C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A09=
83 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 la=
tch free =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15:52:28 =C2=
=A0 =C2=A02020-01-27 15:52:28 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 1713 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 library cache lock =C2=A0 =C2=A02020-01-27 15:50:48 =
=C2=A0 =C2=A02020-01-27 15:50:48 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 1831 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A02020-01-27 15:50:49 =C2=A0 =C2=A02020-01-27 15:50:49 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=
<br><br>11 rows selected.<br><br><br>SQL&gt; @ashtop session_id,blocking_se=
ssion,event2 session_id=3D1710 &quot;TIMESTAMP&#39;2020-01-27 15:50:48&#39;=
&quot; &quot;TIMESTAMP&#39;2020-01-27 15:52:40&#39;&quot;<br><br>no rows se=
lected</font><br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" clas=
s=3D"gmail_attr">On Tue, Jan 28, 2020 at 1:09 PM Noveljic Nenad &lt;<a href=
=3D"mailto:nenad.noveljic@vontobel.com" target=3D"_blank">nenad.noveljic@vo=
ntobel.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">




<div>
You=E2=80=99re using v0.2, but =C2=A0v0.6 should be the latest version. 0.2=
 wouldn=E2=80=99t show idle blockers.
<div><br>
</div>
<div>I looked at the ashtop output once again. Strange that lgwr processes =
nor any other bg processes show up significantly there. The question is, wh=
at was lgwr doing while the fg processes were waiting on him.<br>
<br></div>

</div>

</blockquote></div>
</blockquote></div></div></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div></div>
</div></blockquote></div><br></div></div></blockquote></div>

--0000000000007868c6059d998d60--
--
http://www.freelists.org/webpage/oracle-l


