Return-Path: <oracle-l-bounce@freelists.org>
X-Original-To: oracle-l@orafaq.com
Delivered-To: oracle-l@orafaq.com
Received: from turing.freelists.org (turing.freelists.org [206.53.239.180])
 by malta2546.startdedicated.com (Postfix) with ESMTPS id 2B3B91002BD363
 for <oracle-l@orafaq.com>; Sun,  2 Feb 2020 13:52:55 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id DC32726509;
 Sun,  2 Feb 2020 07:52:52 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freelists.org;
 s=turing; t=1580647973;
 bh=00vss8wXtr6yTzA17c6z+UjW6bRYOWqLh+OdRvDtKYU=;
 h=From:Sender:Sender:From;
 b=CCVsIhsiZLpCl0KIBhmLybPYosNbAM2BcwoKL77ZHW+qupMVdXq+XYXrA9lZ1aKOO
	 lF6xor2V7cOUfXdTjcLl+LOCnRlN8fjK/H/WlpvMpTOkT1fW3dY9Eawh1pPGcqiQqN
	 VrS6Hp/Hz4mswhLlG9qje6wAv2SarQGctCu1KxRI=
X-Virus-Scanned: Debian amavisd-new at turing.freelists.org
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id FzG0vjMk1Rcb; Sun,  2 Feb 2020 07:52:52 -0500 (EST)
Received: from turing.freelists.org (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id D799926501;
 Sun,  2 Feb 2020 07:52:03 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freelists.org;
 s=turing; t=1580647968;
 bh=00vss8wXtr6yTzA17c6z+UjW6bRYOWqLh+OdRvDtKYU=;
 h=From:Sender:Sender:From;
 b=RlT4YykZwJZKWbCSalOvaoc0A+QLHO63htbngojRk6LJjV3qyjH9q2dWIMf12UB3M
	 tClRZxmiTDZm9952faHbcgwVTtDpvCa1zewxIKQK4LLIYPft8eFoBdZpVX7aDwSiFK
	 km1Zc9BPMl12clzvDQpBjysImgL/u5nwaqlY/3Lo=
Received: with ECARTIS (v1.0.0; list oracle-l); Sun, 02 Feb 2020 07:51:18 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 8CE3826500
 for <Oracle-L@freelists.org>; Sun,  2 Feb 2020 07:51:18 -0500 (EST)
Authentication-Results: turing.freelists.org;
 dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dMbDEDFf";
 dkim-atps=neutral
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id be88g_uQhO5H for <Oracle-L@freelists.org>;
 Sun,  2 Feb 2020 07:51:18 -0500 (EST)
Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id E7668264FB
 for <Oracle-L@freelists.org>; Sun,  2 Feb 2020 07:51:17 -0500 (EST)
Received: by mail-wr1-f54.google.com with SMTP id y17so14469603wrh.5
        for <Oracle-L@freelists.org>; Sun, 02 Feb 2020 04:51:17 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:message-id:mime-version:subject:date
         :in-reply-to:cc:to:references;
        bh=ZlZ+pJ4wGqGs/DpRo5AaByuMj5V+YtYKRJ2XzCm6jwk=;
        b=maCSYDE8oHdxm8JZQhyUbVHxjr8edcm35dzCbwUjRvR9BKc1xWgYcrwbC7JIAcMLvm
         9WZID7/WKPU7L3iboVtN3psmAF1q0idThUcgUJtRrx9nzdAUU6a6ayo2YNCSPoJ5WZtm
         ohRbFtIxnAJulXaIM7ST5JsAz7kfARKkecZhaXXiEwLOqNiabx5+2vhSVdIjTOKSC7Kd
         zXpKaUN1j4wYZjNe+W+cQJ3Xb5dZn6S4t2YGqCO+Dhirj/3+5OwIiuGN5CsDW1Wbxqpi
         v7TdvDZWJ2zrrgsprqul7qAB+oXk2ZP4aBsEP2yw880eIacfQsTmqRJtfOdo/pgSMc+k
         GoLA==
X-Gm-Message-State: APjAAAVbxuAM44SF4nlbgCPDRfyvJRXsoQOHhtxkjyCkiyuAkVxX5ahs
 1gSm7FlIlDESsq9eYgg+7QEuk3PU2u8=
X-Google-Smtp-Source: APXvYqxHUQMz9bLZnfj+0CiqBlV+fTxOOajqwCvw2WYnFZLpNagcvAsjuVDEd54GpIKgBWFywWfjhw==
X-Received: by 2002:adf:ea88:: with SMTP id s8mr10041380wrm.293.1580647876697;
        Sun, 02 Feb 2020 04:51:16 -0800 (PST)
Received: from amac02qgdycg8wn.local (84-106-29-50.cable.dynamic.v4.ziggo.nl. [84.106.29.50])
        by smtp.gmail.com with ESMTPSA id t10sm18614554wmi.40.2020.02.02.04.51.15
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Sun, 02 Feb 2020 04:51:15 -0800 (PST)
From: Frits Hoogland <frits.hoogland@gmail.com>
Message-Id: <5571F2DD-2C37-44EB-AAF6-A692E0779AD1@gmail.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_68151583-789F-4A89-9EA9-0126EF206DEE"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Subject: Re: intermittent long "log file sync" waits
Date: Sun, 2 Feb 2020 13:51:14 +0100
In-Reply-To: <CAMHX9JLW_i_c+7V0jMwXhWeAKNvUhEdVkf0sKEyHg62XyZP3aA@mail.gmail.com>
Cc: Noveljic Nenad <nenad.noveljic@vontobel.com>,
 oracle-l <Oracle-L@freelists.org>
To: Tanel Poder <tanel@tanelpoder.com>,
 Chris Stephens <cstephens16@gmail.com>
References: <CAEFL0szEBH0P-hqb39Pmjw-pdJ60OZ+RUuFoqCXsC7+=iPEFzQ@mail.gmail.com>
 <38763_1580228753_5E306091_38763_12713_1_cf89e1868e2e40949f6177bf1c604696@vontobel.com>
 <CAEFL0swS2CGeDF2pGWHDBQkO8J1cH7cMbDCxBngx613UW52A+Q@mail.gmail.com>
 <39760_1580232621_5E306FAD_39760_6519_1_238f704e660247f78d5e2f6a5c98b27f@vontobel.com>
 <CAEFL0szP5+-ZUog8Q=P0TiusLw6MpDDM46hZM=7ps0v0o3p_OQ@mail.gmail.com>
 <38763_1580238546_5E3086D2_38763_18592_1_8d9cba2aa89d4aba82f0b60173245848@vontobel.com>
 <CAEFL0szn17WU9kSj7ipRhieXd4w_3hzhU5gSmxDxhx1=WnqDkw@mail.gmail.com>
 <CAMHX9J+FooJWWr896aFx7oqeMk=PKA__GaHjxjU_gsOyE1Tzjg@mail.gmail.com>
 <CAEFL0sx7fAF89Qike6PS2Jy9-N5S-w-r31XQ7kj8AEGP5FonTg@mail.gmail.com>
 <CAEFL0syR4Xy2fPiL4M3jtmmfj_02MmYgC_haCM=LDZ17ikuGgg@mail.gmail.com>
 <CAMHX9JLchyEEvh4mHH8hHXEa53r5FP4Ck-kpCMLUiTgOJzYuEw@mail.gmail.com>
 <CAMHX9JLW_i_c+7V0jMwXhWeAKNvUhEdVkf0sKEyHg62XyZP3aA@mail.gmail.com>
X-archive-position: 76131
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: frits.hoogland@gmail.com
Precedence: normal
Reply-To: frits.hoogland@gmail.com
List-Help: <mailto:ecartis@freelists.org?Subject=help>
List-Unsubscribe: <mailto:oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-Subscribe: <mailto:oracle-l-request@freelists.org?Subject=subscribe>
List-Owner: <mailto:>
List-post: <mailto:oracle-l@freelists.org>
List-Archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l
--Apple-Mail=_68151583-789F-4A89-9EA9-0126EF206DEE
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
 charset=utf-8

I didn=E2=80=99t look that deep, I tried to get an idea of the situation =
and the current state.

This, as others have concluded already, then looks like the bug that =
seems to surface every once in a while where sessions that use post-wait =
for waiting for the logwriter get =E2=80=9Cforgotten=E2=80=9D when the =
instance switches to polling.
The switching is visible, or at least was visible when I looked at it in =
versions 11.2 and 12.1.0.2, by looking at the logwriter trace file, it =
mentions when the logwriter switches between post-wait and polling and =
back.
However, in these cases, the process would way longer, until the =
logwriter decided to switch back to post-wait and found the waiting =
sessions again.

When this is a bug with the above described issue, of course it needs to =
be fixed. However, there is a way around it:
- In post-wait mode, a process waiting in =E2=80=98log file sync=E2=80=99 =
can only end the wait if two conditions are met: it needs to be posted =
and the on_disk_scn must be beyond the process=E2=80=99 commit SCN. =
Based on what is shown, I assume the logwriter happily gone on and had =
the on disk scn updated (it does that always every 3 seconds) probably =
beyond the process=E2=80=99 commit scn. So the most logical thing the =
process is waiting for, is to be posted, which did not happen.
- In polling mode, a process will wait for a pre-computed time, during =
which it can not be notified, and then check for the on disk scn. The =
pre-computed time in normal circumstances is really low, probably around =
a millisecond, but computed based on the latency of your system. The =
important bit is there is no need for the logwriter to do anything: the =
waiting is based on the progress that the logwriter makes by means of =
the on disk scn, but it doesn=E2=80=99t require additional work, like =
being posted.=20
- Therefore, if you fix your instances to use polling only, it can not =
come in the situation of a post-wait process being forgotten. This can =
be achieved by setting =E2=80=9C_use_adaptive_log_file_sync=E2=80=9D to =
polling_only. This is an underscore parameter, and therefore you have to =
ask blessing from oracle support to use it, but since you already got a =
SR open, this should be simple to do.

I can=E2=80=99t think of any downsides of using polling_only. Maybe =
someone can?
Even the use of it, since it=E2=80=99s switched to dynamically by all =
supported versions, is high.

Frits Hoogland

http://fritshoogland.wordpress.com <http://fritshoogland.wordpress.com/>
frits.hoogland@gmail.com <mailto:frits.hoogland@gmail.com>
Mobile: +31 6 14180860

> On 2 Feb 2020, at 04:50, Tanel Poder <tanel@tanelpoder.com> wrote:
>=20
> But your trace file here already indicates that there's some glitch =
regarding LGWR wakeups/posting:
>=20
> Foreground:
>=20
> *** 2020-01-29T13:04:20.724666-06:00
> WAIT #0: nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync =
scn=3D976089279 p3=3D0 obj#=3D-1 tim=3D1025228531958
>=20
> LGWR at the same time:
>=20
> *** 2020-01-29T13:04:20.725994-06:00
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228533313
>=20
> The WAIT lines are printed out when the wait ends (and the *** =
timestamp line just before the WAIT line is printed out, if enough time =
has passed since the last *** line). So this shows that LGWR had happily =
been idle, waiting for its idle 3-second timeout to expire, or someone =
to wake it up (that's what "rdbms ipc message" means) - while someone =
had issued a commit and was waiting for LGWR acknowledgment. So this =
shows that LGWR didn't even get the post from your session, nor any =
other sessions (if you had concurrent transactional activity). Back in =
11g, typically these problems happened when adaptive log file syncing =
switch between poll and post/wait wakeup method happened and you hit a =
bug. But in 12c there are more moving parts.
>=20
> --
> Tanel Poder
> https://blog.tanelpoder.com/seminar =
<https://blog.tanelpoder.com/seminar>
> On Sat, Feb 1, 2020 at 10:34 PM Tanel Poder <tanel@tanelpoder.com =
<mailto:tanel@tanelpoder.com>> wrote:
> Hi Chris,
>=20
> As 12c+ has log writer slaves and Oracle can adaptavely switch from =
single LGWR to LGWR + LGxx slave (_use_single_log_writer=3Dadaptive), =
can you check ASH data for the LGxx slaves at this time too. Not all =
process-to-process interactions/waits are instrumented with a =
blocking_session in Oracle...
>=20
> Should be this command:
>=20
> @ashtop username,event2,program2,blocking_session,blocking_inst_id =
"program like '%(LG%)'" "TIMESTAMP'2020-01-29 13:04:00'" =
"TIMESTAMP'2020-01-29 13:05:00'"
>=20
> --
> Tanel Poder
> https://blog.tanelpoder.com/seminar =
<https://blog.tanelpoder.com/seminar>
>=20
>=20
> On Thu, Jan 30, 2020 at 3:44 PM Chris Stephens <cstephens16@gmail.com =
<mailto:cstephens16@gmail.com>> wrote:
> Here's some additional ash output from the trace data posted in last =
message including event_hist.sql:
>=20
> SQL> @ashtop =
username,event2,program2,blocking_session,blocking_inst_id 1=3D1 =
"TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-29 13:04:27'"
>    TOTALSECONDS    AAS      %This         USERNAME           EVENT2    =
  PROGRAM2    BLOCKING_SESSION    BLOCKING_INST_ID             =
FIRST_SEEN              LAST_SEEN    DIST_SQLEXEC_SEEN
> _______________ ______ __________ ________________ ________________ =
_____________ ___________________ ___________________ =
______________________ ______________________ ____________________
>               2    0.3   50% |    SYS              ON CPU           =
(sqlplus)                                             2020-01-29 =
13:04:24    2020-01-29 13:04:25                       1
>               1    0.1   25% |    GEN3_MGOWER_3    ON CPU           =
(python)                                              2020-01-29 =
13:04:22    2020-01-29 13:04:22                       1
>               1    0.1   25% |    GEN3_MGOWER_3    log file sync    =
(python)                     1710                   2 2020-01-29 =
13:04:20    2020-01-29 13:04:20                       1
>=20
>=20
> SQL> @bg lgwr
>    NAME    DESCRIPTION     SID    OPID     SPID               PADDR    =
           SADDR
> _______ ______________ _______ _______ ________ ___________________ =
___________________
> LGWR    Redo etc.         1710      34 26552    00000001E8718860    =
00000001D8BFF4A0
>=20
> SQL> @ashtop =
username,event2,program2,blocking_session,blocking_inst_id =
session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-29 =
13:04:27'"
>=20
> no rows selected
>=20
> SQL> @ashtop =
username,event2,program2,blocking_session,blocking_inst_id =
session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-29 =
13:05:00'"
>=20
> no rows selected
>=20
> SQL> @ash/event_hist "log file" 1=3D1  "TIMESTAMP'2020-01-29 =
13:04:00'" "TIMESTAMP'2020-01-29 13:05:00'"
>        EVH_EVENT       EVH_MILLISEC    EVH_SAMPLE_COUNT    =
EVH_EST_EVENT_COUNT    EVH_EST_TOTAL_SEC    PCT_EVT_TIME       EVH_GRAPH =
            FIRST_SEEN              LAST_SEEN
> ________________ __________________ ___________________ =
______________________ ____________________ _______________ =
_______________ ______________________ ______________________
> log file sync             < 1024                      1                =
    1.8                  1.8            12.8 |#         |    2020-01-29 =
13:04:13    2020-01-29 13:04:13
> log file sync             < 4096                      1                =
      1                  4.1            29.1 |###       |    2020-01-29 =
13:04:12    2020-01-29 13:04:12
> log file sync             < 8192                      1                =
      1                  8.2            58.2 |######    |    2020-01-29 =
13:04:20    2020-01-29 13:04:20
>=20
> SQL> @ash/event_hist "write" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'" =
"TIMESTAMP'2020-01-29 13:05:00'"
>=20
> no rows selected
>=20
> SQL> @ash/event_hist "*" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'" =
"TIMESTAMP'2020-01-29 13:05:00'"
>         EVH_EVENT       EVH_MILLISEC    EVH_SAMPLE_COUNT    =
EVH_EST_EVENT_COUNT    EVH_EST_TOTAL_SEC    PCT_EVT_TIME       EVH_GRAPH =
            FIRST_SEEN              LAST_SEEN
> _________________ __________________ ___________________ =
______________________ ____________________ _______________ =
_______________ ______________________ ______________________
> log file sync              < 1024                      1               =
     1.8                  1.8            12.8 |#         |    2020-01-29 =
13:04:13    2020-01-29 13:04:13
> log file sync              < 4096                      1               =
       1                  4.1            29.1 |###       |    2020-01-29 =
13:04:12    2020-01-29 13:04:12
> log file sync              < 8192                      1               =
       1                  8.2            58.2 |######    |    2020-01-29 =
13:04:20    2020-01-29 13:04:20
> row cache lock                < 1                      1               =
  2469.1                  2.5             100 |##########|    2020-01-29 =
13:04:01    2020-01-29 13:04:01
>=20
> SQL> @ashtop =
username,event2,program2,blocking_session,blocking_inst_id =
"lower(program) like '%lg%'" "TIMESTAMP'2020-01-29 13:04:00'" =
"TIMESTAMP'2020-01-29 13:05:00'"
>=20
> no rows selected
>=20
> i'm going to copy over gv$active_session_history to a single instance =
database and take a look at ash_wait_chains but i don't expect that to =
reveal anything either.
>=20
> i need to confirm whether adding redo resolved issue (will have =
developer fire up workload) but I think next step is to investigate =
x$trace and possible process communication issues?
>=20
> On Thu, Jan 30, 2020 at 10:27 AM Chris Stephens <cstephens16@gmail.com =
<mailto:cstephens16@gmail.com>> wrote:
> Hey Tanel,
>=20
> Thanks for taking time to write this up! Will definitely be attending =
that webinar.
>=20
> comments inline
>=20
> On Wed, Jan 29, 2020 at 10:32 PM Tanel Poder <tanel@tanelpoder.com =
<mailto:tanel@tanelpoder.com>> wrote:
> Comments:
>=20
> 1) First, looks like you're using all the right tools for systematic =
troubleshooting ;-)
>=20
> ***i try!=20
>=20
> 2) As "log file sync" is not directly an I/O wait event (IO happens =
indirectly under the hood), but "wait for LGWR to ACK/increment redo =
persisting position" then there may be multiple reasons for delays. I'm =
repeating some of what others have said here too:
>   - LGWR could be waiting for remote ACK (synchronous data guard) - =
that should show up as a LNS* wait or something like that though
>   - Process slowness due to CPU overload/scheduling latency or =
swapping (in both cases, LGWR may do its work fast, but it takes a while =
for the foreground process to wake up and end the wait event - you could =
check if there are any reports regarding switching LGWR mode in =
alert.log around that time)
>   - Bug/missed post due to switching between LGWR progress polling vs. =
waiting for LGWR post
>   - ASM instance communication issues (ASM communication can be on the =
critical path when creating/changing files on ASM and even for reading, =
when the ASM metadata cache in your local instance doesn't have what it =
needs)
>   - So, the physical redo log file disk write is not necessarily the =
biggest contributor to the log file sync wait
>=20
> *** no data guard in the env.
> *** we have excess CPU. (20 cores / instance and usually only 1-5 =
active foreground processes but i will confirm from TFA collections)
> *** yesterday i was ready to blame bug. today, not so sure. i checked =
log file switches with a script google found me a few days ago. now =
thinking that was inaccurate as a script from metastink shows tones of =
switching so that could very well be related.
> ***asm communication is something i need to look into. TFA has =
reported some ASM issues but there's no details and i never saw anything =
in asm alert logs indicating an issue.
> =20
> 3) You'll need to measure what LGWR itself was doing during the =
problem time
>   - Since you're on 19c (12c or later) the LGWR has slaves too and =
they may be doing most of the work=20
>   - You'll need to look into both the LGWR process and any other LG* =
ones too
>   - Since you have (Oracle) Snapper running already and writing to =
trace you should have the data already
>   - The lgwr in the following @snapper ash,stats,trace 10 999999 lgwr =
syntax includes LGWR slaves too
>   - As LGWR slaves have their own sessions (and SIDs), you should run =
ashtop etc and report if there's any BLOCKING_SESSION value listed for =
LGWR and slaves
>   - If LGWR and the slaves were listed idle during the problem time =
(while others waited for log file sync event at the same time) and if =
there was no huge OS level CPU/memory shortage at the time, this looks =
like a background <-> foreground communication problem.=20
>      - This (and any potential ASM issues) could probably be =
troubleshooted via KST tracing, KSR channels and X$TRACE that could help =
here (some of it is enabled by default ( Like I showed in a previous =
hacking session: https://www.youtube.com/watch?v=3DmkmvZv58W6w =
<https://www.youtube.com/watch?v=3DmkmvZv58W6w> )
>=20
>=20
> here is all the output i have corresponding to a 6 sec "log file sync" =
wait:
>=20
> Experience duration:	R =3D t1 =E2=88=92 t0 =3D 41.336137=C2=A0seconds =
<x-msg://3/#m_-8103031405908768729_m_7542945560652427325_m_-55096910959236=
74063_profile-by-subroutine>
> Experience begin time:	t0 =3D 2020-01-29T13:04:07.560478
> Experience end time:	t1 =3D 2020-01-29T13:04:48.896615
> CALL-NAME                        DURATION       %   CALLS      MEAN    =
   MIN       MAX
> ------------------------------  ---------  ------  ------  --------  =
--------  --------
> SQL*Net message from client     26.017105   66.5%   9,886  0.002632  =
0.000209  7.858051
> log file sync                   10.582409   27.1%     150  0.070549  =
0.000295  6.909711
> EXEC                             2.088757    5.3%   9,702  0.000215  =
0.000000  0.209199
> DLM cross inst call completion   0.109617    0.3%     212  0.000517  =
0.000005  0.080575
> row cache lock                   0.070118    0.2%     125  0.000561  =
0.000208  0.000972
> FETCH                            0.047586    0.1%   1,029  0.000046  =
0.000000  0.003056
> PARSE                            0.045293    0.1%   4,778  0.000009  =
0.000000  0.002692
> CLOSE                            0.033453    0.1%   4,784  0.000007  =
0.000000  0.000237
> library cache lock               0.023614    0.1%      55  0.000429  =
0.000152  0.000901
> library cache pin                0.022311    0.1%      55  0.000406  =
0.000153  0.000689
> 17 others                        0.071430    0.2%  10,772  0.000007  =
0.000000  0.001487
> ------------------------------  ---------  ------  ------  --------  =
--------  --------
> TOTAL (27)                      39.111693  100.0%  41,548  0.000941  =
0.000000  7.858051
>=20
> 6 sec wait from raw data:
>=20
> *** 2020-01-29T13:04:20.724666-06:00
> WAIT #0: nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync =
scn=3D976089279 p3=3D0 obj#=3D-1 tim=3D1025228531958
> WAIT #0: nam=3D'SQL*Net message to client' ela=3D 2 driver =
id=3D1413697536 #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228532118
> WAIT #0: nam=3D'SQL*Net message from client' ela=3D 1711 driver =
id=3D1413697536 #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228533852
> BINDS #140204587734864:
>=20
> i enabled 10046 on lgwr:
>=20
> *** 2020-01-29T13:04:20.725994-06:00
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228533313
> WAIT #0: nam=3D'log file parallel write' ela=3D 279 files=3D1 =
blocks=3D17 requests=3D2 obj#=3D-1 tim=3D1025228533777
> WAIT #0: nam=3D'log file parallel write' ela=3D 171 files=3D1 blocks=3D3=
 requests=3D2 obj#=3D-1 tim=3D1025228534104
> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 22 copy latch #=3D28 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228534182
> WAIT #0: nam=3D'log file parallel write' ela=3D 183 files=3D1 blocks=3D2=
 requests=3D2 obj#=3D-1 tim=3D1025228534437
> WAIT #0: nam=3D'log file parallel write' ela=3D 174 files=3D1 blocks=3D2=
 requests=3D2 obj#=3D-1 tim=3D1025228534721
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 31 timeout=3D8 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228534786
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 3233 timeout=3D8 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228538038
> WAIT #0: nam=3D'log file parallel write' ela=3D 170 files=3D1 blocks=3D2=
 requests=3D1 obj#=3D-1 tim=3D1025228538297
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 25 timeout=3D8 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228538358
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 11829 timeout=3D8 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228550204
> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 52 copy latch #=3D8 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228550294
> WAIT #0: nam=3D'log file parallel write' ela=3D 239 files=3D1 blocks=3D7=
 requests=3D1 obj#=3D-1 tim=3D1025228550624
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 23 timeout=3D6 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228550699
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16750 timeout=3D6 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228567467
> WAIT #0: nam=3D'log file parallel write' ela=3D 207 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228567762
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 26 timeout=3D5 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228567825
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16249 timeout=3D5 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228584091
> WAIT #0: nam=3D'log file parallel write' ela=3D 213 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228584400
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 48 timeout=3D3 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228584502
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16466 timeout=3D3 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228600993
> WAIT #0: nam=3D'log file parallel write' ela=3D 228 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228601393
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 29 timeout=3D1 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228601482
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 9548 timeout=3D1 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228611053
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 7467 timeout=3D300 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228618574
> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 13 copy latch #=3D8 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228618636
> WAIT #0: nam=3D'log file parallel write' ela=3D 222 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228618960
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 31 timeout=3D300 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228619066
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 15946 timeout=3D300 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228635045
> WAIT #0: nam=3D'log file parallel write' ela=3D 240 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228635427
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 33 timeout=3D298 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228635525
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 15987 timeout=3D298 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228651537
> WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228651912
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 51 timeout=3D296 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228652034
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16332 timeout=3D296 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228668405
> WAIT #0: nam=3D'log file parallel write' ela=3D 256 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228668804
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D295 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228668911
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16969 timeout=3D295 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228685907
> WAIT #0: nam=3D'log file parallel write' ela=3D 273 files=3D1 blocks=3D9=
 requests=3D1 obj#=3D-1 tim=3D1025228686327
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D293 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228686453
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 6908 timeout=3D293 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025228693390
> WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1 blocks=3D2=
 requests=3D1 obj#=3D-1 tim=3D1025228693768
> WAIT #0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D292 p2=3D0 p3=3D0=
 obj#=3D-1 tim=3D1025228693862j
>  ...more of the same
> 264396 WAIT #0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D52 p2=3D0 =
p3=3D0 obj#=3D-1 tim=3D1025231090742
>  264397 WAIT #0: nam=3D'rdbms ipc message' ela=3D 19460 timeout=3D52 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231110227
>  264398 WAIT #0: nam=3D'log file parallel write' ela=3D 272 files=3D1 =
blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231110633
>  264399 WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D50 p2=3D0=
 p3=3D0 obj#=3D-1 tim=3D1025231110748
>  264400 WAIT #0: nam=3D'rdbms ipc message' ela=3D 15484 timeout=3D50 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231126261
>  264401 WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1 =
blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231126641
>  264402 WAIT #0: nam=3D'rdbms ipc message' ela=3D 36 timeout=3D49 p2=3D0=
 p3=3D0 obj#=3D-1 tim=3D1025231126753
>  264403 WAIT #0: nam=3D'rdbms ipc message' ela=3D 16732 timeout=3D49 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231143514
>  264404 WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 31 copy latch =
#=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231143603
>  264405 WAIT #0: nam=3D'log file parallel write' ela=3D 279 files=3D1 =
blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231143976
>  264406 WAIT #0: nam=3D'rdbms ipc message' ela=3D 35 timeout=3D47 p2=3D0=
 p3=3D0 obj#=3D-1 tim=3D1025231144093
>  264407 WAIT #0: nam=3D'rdbms ipc message' ela=3D 5384 timeout=3D47 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231149515
>  264408 WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1 =
blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231149889
>  264409 WAIT #0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D47 p2=3D0=
 p3=3D0 obj#=3D-1 tim=3D1025231149989
>  264410
>  264411 *** 2020-01-29T13:04:23.802774-06:00
>  264412 WAIT #0: nam=3D'rdbms ipc message' ela=3D 460056 timeout=3D46 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231610086
>  264413
>  264414 *** 2020-01-29T13:04:26.797925-06:00
>  264415 WAIT #0: nam=3D'rdbms ipc message' ela=3D 2995027 timeout=3D300 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025234605238
>  264416 WAIT #0: nam=3D'rdbms ipc message' ela=3D 10649 timeout=3D1 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025234616054
>  264417
>=20
> i will get you snapper lgwr output associated w/ this interval as soon =
as i can find it (can't remember where i stashed it and lost trace file =
name .. fml)
>=20
> 4) Paste the full Snapper output (of one time snapshot from the =
problem time) here if you can - we'll be able to see what the slaves =
were doing
>   - You'd need to include not only LGWR SID but also any LG* slave =
SIDs too
>   - Also we'll see better LGWR delay metrics from V$SESSTAT that go =
beyond just wait events
>   - Thinking of metrics like these - snapper already handles them (if =
you look into the rightmost column)
>      - redo synch time overhead (usec) - "FG wakeup overhead per log =
file sync"
>      - redo write time - accounting end to end redo write time (not =
just the log file parallel write syscall) "per redo write"
>=20
> 5) Regarding the ASH wait chains error on 19c - I'm aware of this but =
forgot to look into it
>    - It has (probably) something to do with GV$ views (or PX slaves =
used for PDB V$ queries) and possibly the connect by I'm using
>    - Does it work in the CDB (if multitenant)?
>    - Does it work with lower optimizer features enabled?
>    - As we are already manually looking into what LGWR and its slaves =
are doing, there's no need for the wait chains script here
>=20
> ***oracle has filed a bug. altering optimizer_features_enabled doesn't =
help.
> =20
> 6) Linux Process Snapper - pSnapper
>    - If you want to see the full "wait" states of Linux =
processes/threads, use the -a option.
>    - It's somewhat like always showing "idle" threads too
>    - But for linux apps it's harder to determine what's idle and =
what's some network RPC wait, so I currently just do what Linux "load" =
metric does
>    - I just shows threads in Runnable and Disk/Uninterruptible sleep =
states by default (-a shows all states, including Sleep and kernel-Idle =
threads etc)
>    - So the Linux pSnapper "idle" means a somewhat different thing =
than Oracle "idle" - say LGWR was waiting for a Data Guard ACK, that =
should not be shown as Idle in Oracle, but at the Linux process level =
it's "just some network socket wait" and you'd need to use -a option to =
show all of them
>=20
> *** will add that option to future collections (and watch the =
video/read help)=20
>=20
> 7) Were LGWR/LG* processes mostly idle during the problem time?
>    - As I kept reading the thread, it looks like ASH reported LGWR =
being mostly idle during the problem time
>    - This is not exactly the same pattern, but I've seen cases in the =
past where everyone's waiting for LGWR and LGWR doesn't seem to be busy =
and doesn't show any significant waits
>    - It turned out that LGWR was actually waiting for CKPT to make =
progress, but it wasn't synchronously waiting, but just going to sleep =
and "checking again later" in a loop
>    - I wrote about this here: =
https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incomplete-an=
d-lgwr-waiting-for-checkpoint-progress/ =
<https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incomplete-a=
nd-lgwr-waiting-for-checkpoint-progress/>
>    - So it may still make sense to check what LGWR/LG* processes are =
doing even if they don't seem to be active much (like do you see an =
occasional controlfile read wait by LGWR etc)
>    - SQL*Trace would capture all activity of LGWR, but may slow things =
down further, so you could also sample & spool V$SESSION_WAIT_HISTORY =
(that keeps a log of last 10 waits of every session in memory)
>=20
> *** i think i've addressed most of this.
> =20
> Incidentally I'm doing another hacking session tomorrow (Thursday 30 =
Jan @ 12pm ET) about complex Oracle performance problems that span =
multiple layers and have multiple moving parts. I can add some LGWR =
troubleshooting magic in the end of that (here goes my plan to fit the =
session in 1.5h again!). I'll post the details in my blog in a few =
minutes.
>=20
> *** can't wait!!=20
> --
> Tanel Poder
> https://blog.tanelpoder.com/seminar =
<https://blog.tanelpoder.com/seminar>
>=20
>=20
> On Tue, Jan 28, 2020 at 2:22 PM Chris Stephens <cstephens16@gmail.com =
<mailto:cstephens16@gmail.com>> wrote:
> doesn't look like any ash data for lgwr:
>=20
> SQL> @ashtop session_id,blocking_session,event2 1=3D1 =
"TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"
>    TOTALSECONDS    AAS      %This    SESSION_ID    BLOCKING_SESSION    =
            EVENT2             FIRST_SEEN              LAST_SEEN    =
DIST_SQLEXEC_SEEN
> _______________ ______ __________ _____________ ___________________ =
_____________________ ______________________ ______________________ =
____________________
>              81    0.7   82% |             1713                1710 =
log file sync         2020-01-27 15:50:54    2020-01-27 15:52:14         =
              1
>               5      0    5% |             1221                     ON =
CPU                2020-01-27 15:50:59    2020-01-27 15:52:14            =
           1
>               3      0    3% |              246                     ON =
CPU                2020-01-27 15:51:00    2020-01-27 15:52:00            =
           1
>               3      0    3% |             1713                     ON =
CPU                2020-01-27 15:50:51    2020-01-27 15:52:27            =
           2
>               1      0    1% |              123                     ON =
CPU                2020-01-27 15:51:35    2020-01-27 15:51:35            =
           1
>               1      0    1% |              127                     ON =
CPU                2020-01-27 15:51:50    2020-01-27 15:51:50            =
           1
>               1      0    1% |              252                2321 =
latch free            2020-01-27 15:52:28    2020-01-27 15:52:28         =
              1
>               1      0    1% |              978                     =
ges remote message    2020-01-27 15:50:48    2020-01-27 15:50:48         =
              1
>               1      0    1% |              983                     =
latch free            2020-01-27 15:52:28    2020-01-27 15:52:28         =
              1
>               1      0    1% |             1713                     =
library cache lock    2020-01-27 15:50:48    2020-01-27 15:50:48         =
              1
>               1      0    1% |             1831                     ON =
CPU                2020-01-27 15:50:49    2020-01-27 15:50:49            =
           1
>=20
>=20
> 11 rows selected.
>=20
>=20
> SQL> @ashtop session_id,blocking_session,event2 session_id=3D1710 =
"TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"
>=20
> no rows selected
>=20
> On Tue, Jan 28, 2020 at 1:09 PM Noveljic Nenad =
<nenad.noveljic@vontobel.com <mailto:nenad.noveljic@vontobel.com>> =
wrote:
> You=E2=80=99re using v0.2, but  v0.6 should be the latest version. 0.2 =
wouldn=E2=80=99t show idle blockers.
>=20
> I looked at the ashtop output once again. Strange that lgwr processes =
nor any other bg processes show up significantly there. The question is, =
what was lgwr doing while the fg processes were waiting on him.
>=20


--Apple-Mail=_68151583-789F-4A89-9EA9-0126EF206DEE
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
 charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D"">I =
didn=E2=80=99t look that deep, I tried to get an idea of the situation =
and the current state.<div class=3D""><br class=3D""></div><div =
class=3D"">This, as others have concluded already, then looks like the =
bug that seems to surface every once in a while where sessions that use =
post-wait for waiting for the logwriter get =E2=80=9Cforgotten=E2=80=9D =
when the instance switches to polling.</div><div class=3D"">The =
switching is visible, or at least was visible when I looked at it in =
versions 11.2 and 12.1.0.2, by looking at the logwriter trace file, it =
mentions when the logwriter switches between post-wait and polling and =
back.</div><div class=3D"">However, in these cases, the process would =
way longer, until the logwriter decided to switch back to post-wait and =
found the waiting sessions again.</div><div class=3D""><br =
class=3D""></div><div class=3D"">When this is a bug with the above =
described issue, of course it needs to be fixed. However, there is a way =
around it:</div><div class=3D"">- In post-wait mode, a process waiting =
in =E2=80=98log file sync=E2=80=99 can only end the wait if two =
conditions are met: it needs to be posted and the on_disk_scn must be =
beyond the process=E2=80=99 commit SCN. Based on what is shown, I assume =
the logwriter happily gone on and had the on disk scn updated (it does =
that always every 3 seconds) probably beyond the process=E2=80=99 commit =
scn. So the most logical thing the process is waiting for, is to be =
posted, which did not happen.</div><div class=3D"">- In polling mode, a =
process will wait for a pre-computed time, during which it can not be =
notified, and then check for the on disk scn. The pre-computed time in =
normal circumstances is really low, probably around a millisecond, but =
computed based on the latency of your system. The important bit is there =
is no need for the logwriter to do anything: the waiting is based on the =
progress that the logwriter makes by means of the on disk scn, but it =
doesn=E2=80=99t require additional work, like being =
posted.&nbsp;</div><div class=3D"">- Therefore, if you fix your =
instances to use polling only, it can not come in the situation of a =
post-wait process being forgotten. This can be achieved by setting =
=E2=80=9C_use_adaptive_log_file_sync=E2=80=9D to polling_only. This is =
an underscore parameter, and therefore you have to ask blessing from =
oracle support to use it, but since you already got a SR open, this =
should be simple to do.</div><div class=3D""><br class=3D""></div><div =
class=3D"">I can=E2=80=99t think of any downsides of using polling_only. =
Maybe someone can?</div><div class=3D"">Even the use of it, since it=E2=80=
=99s switched to dynamically by all supported versions, is high.<br =
class=3D""><div class=3D"">
<div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; orphans: =
auto; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><div style=3D"color: rgb(0, 0, 0); font-family: Helvetica; =
font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
-webkit-auto; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space;" class=3D""><div style=3D"color: rgb(0, 0, 0); =
font-family: Helvetica; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: =
none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><div style=3D"color: rgb(0, 0, 0); font-family: Helvetica; =
font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
-webkit-auto; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space;" class=3D""><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-align: -webkit-auto; text-indent: 0px; text-transform: none; =
white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: =
0px;"><br class=3D"Apple-interchange-newline">Frits Hoogland<br =
class=3D""><br class=3D""><a href=3D"http://fritshoogland.wordpress.com/" =
class=3D"">http://fritshoogland.wordpress.com</a><br class=3D""><a =
href=3D"mailto:frits.hoogland@gmail.com" =
class=3D"">frits.hoogland@gmail.com</a><br class=3D""><div =
class=3D""><span style=3D"text-align: -webkit-auto;" class=3D"">Mobile: =
+31 6 14180860</span></div></span></div></div></div></div></div>
</div>
<div><br class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D"">On 2 Feb 2020, at 04:50, Tanel Poder &lt;<a =
href=3D"mailto:tanel@tanelpoder.com" =
class=3D"">tanel@tanelpoder.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D"">But your trace file here already indicates that =
there's some glitch regarding LGWR wakeups/posting:<div class=3D""><br =
class=3D""></div><div class=3D""><b class=3D"">Foreground:</b></div><div =
class=3D""><br class=3D""></div><div class=3D""><div class=3D"">*** =
2020-01-29T<b class=3D""><font color=3D"#0000ff" =
class=3D"">13:04:20</font></b>.724666-06:00<br class=3D"">WAIT #0: =
nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync scn=3D976089279 =
p3=3D0 obj#=3D-1 tim=3D1025228531958<br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D""><b class=3D"">LGWR at =
the same time:</b></div><div class=3D""><br class=3D""></div><div =
class=3D"">*** 2020-01-29T<b class=3D""><font color=3D"#0000ff" =
class=3D"">13:04:20</font></b>.725994-06:00<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228533313</div></div><div class=3D""><br =
class=3D""></div><div class=3D"">The WAIT lines are printed out when the =
wait ends (and the *** timestamp line just before the WAIT line is =
printed out, if enough time has passed since the last *** line). So this =
shows that LGWR had happily been idle, waiting for its idle 3-second =
timeout to expire, or someone to wake it up (that's what "rdbms ipc =
message" means) - while someone had issued a commit and was waiting for =
LGWR acknowledgment. So this shows that LGWR didn't even get the post =
from your session, nor any other sessions (if you had concurrent =
transactional activity). Back in 11g, typically&nbsp;these problems =
happened when adaptive log file syncing switch between poll and =
post/wait wakeup method happened and you hit a bug. But in 12c there are =
more moving parts.</div><div class=3D""><br class=3D""></div><div =
class=3D"">--</div><div class=3D"">Tanel Poder</div><div class=3D""><a =
href=3D"https://blog.tanelpoder.com/seminar" =
class=3D"">https://blog.tanelpoder.com/seminar</a></div></div></div></div>=
<br class=3D""><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Sat, Feb 1, 2020 at 10:34 PM Tanel Poder &lt;<a =
href=3D"mailto:tanel@tanelpoder.com" =
class=3D"">tanel@tanelpoder.com</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D"">Hi Chris,<div class=3D""><br =
class=3D""></div><div class=3D"">As 12c+ has log writer slaves and =
Oracle can adaptavely switch&nbsp;from single LGWR to LGWR&nbsp;+ LGxx =
slave (_use_single_log_writer=3Dadaptive), can you check ASH data for =
the LGxx slaves at this time too. Not all process-to-process =
interactions/waits are instrumented with a blocking_session in =
Oracle...</div><div class=3D""><br class=3D""></div><div class=3D"">Should=
 be this command:</div><div class=3D""><br class=3D""></div><div =
class=3D""><div class=3D""><font face=3D"monospace" class=3D"">@ashtop =
username,event2,program2,blocking_session,blocking_inst_id "program like =
'%(LG%)'" "TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-29 =
13:05:00'"</font></div></div><div class=3D""><br class=3D""></div><div =
class=3D"">--<br class=3D""></div><div class=3D"">Tanel Poder</div><div =
class=3D""><a href=3D"https://blog.tanelpoder.com/seminar" =
target=3D"_blank" =
class=3D"">https://blog.tanelpoder.com/seminar</a></div><div =
class=3D""><br class=3D""></div></div></div></div></div><br =
class=3D""><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Thu, Jan 30, 2020 at 3:44 PM Chris Stephens =
&lt;<a href=3D"mailto:cstephens16@gmail.com" target=3D"_blank" =
class=3D"">cstephens16@gmail.com</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><font =
face=3D"monospace" class=3D"">Here's some additional ash output from the =
trace data posted in last message including event_hist.sql:</font><div =
class=3D""><font face=3D"monospace" class=3D""><br =
class=3D""></font></div><div class=3D""><font face=3D"monospace" =
class=3D"">SQL&gt; @ashtop =
username,event2,program2,blocking_session,blocking_inst_id 1=3D1 =
"TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-29 13:04:27'"<br =
class=3D"">&nbsp; &nbsp;TOTALSECONDS &nbsp; &nbsp;AAS &nbsp; &nbsp; =
&nbsp;%This &nbsp; &nbsp; &nbsp; &nbsp; USERNAME &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; EVENT2 &nbsp; &nbsp; &nbsp;PROGRAM2 &nbsp; =
&nbsp;BLOCKING_SESSION &nbsp; &nbsp;BLOCKING_INST_ID &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; FIRST_SEEN &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;LAST_SEEN &nbsp; &nbsp;DIST_SQLEXEC_SEEN<br =
class=3D"">_______________ ______ __________ ________________ =
________________ _____________ ___________________ ___________________ =
______________________ ______________________ ____________________<br =
class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; =
&nbsp;0.3 &nbsp; 50% | &nbsp; &nbsp;SYS &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (sqlplus) =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; 2020-01-29 13:04:24 &nbsp; &nbsp;2020-01-29 13:04:25 =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; 1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1 &nbsp; &nbsp;0.1 &nbsp; 25% | &nbsp; &nbsp;GEN3_MGOWER_3 &nbsp; =
&nbsp;ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (python) &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;2020-01-29 13:04:22 &nbsp; &nbsp;2020-01-29 13:04:22 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1<br =
class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; =
&nbsp;0.1 &nbsp; 25% | &nbsp; &nbsp;GEN3_MGOWER_3 &nbsp; &nbsp;log file =
sync &nbsp; &nbsp;(python) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; 1710 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; 2 2020-01-29 13:04:20 &nbsp; =
&nbsp;2020-01-29 13:04:20 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1<br class=3D""><br class=3D""><br =
class=3D"">SQL&gt; @bg lgwr<br class=3D"">&nbsp; &nbsp;NAME &nbsp; =
&nbsp;DESCRIPTION &nbsp; &nbsp; SID &nbsp; &nbsp;OPID &nbsp; &nbsp; SPID =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; PADDR &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; SADDR<br class=3D"">_______ =
______________ _______ _______ ________ ___________________ =
___________________<br class=3D"">LGWR &nbsp; &nbsp;Redo etc. &nbsp; =
&nbsp; &nbsp; &nbsp; 1710 &nbsp; &nbsp; &nbsp;34 26552 &nbsp; =
&nbsp;00000001E8718860 &nbsp; &nbsp;00000001D8BFF4A0<br =
class=3D""></font></div><div class=3D""><font face=3D"monospace" =
class=3D""><br class=3D""></font></div><div class=3D""><font =
face=3D"monospace" class=3D"">SQL&gt; @ashtop =
username,event2,program2,blocking_session,blocking_inst_id =
session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-29 =
13:04:27'"<br class=3D""><br class=3D"">no rows selected<br class=3D""><br=
 class=3D"">SQL&gt; @ashtop =
username,event2,program2,blocking_session,blocking_inst_id =
session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-29 =
13:05:00'"<br class=3D""><br class=3D"">no rows selected<br =
class=3D""></font></div><div class=3D""><font face=3D"monospace" =
class=3D""><br class=3D""></font></div><div class=3D""><font =
face=3D"monospace" class=3D"">SQL&gt; @ash/event_hist "log file" 1=3D1 =
&nbsp;"TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-29 =
13:05:00'"<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp;EVH_EVENT &nbsp; =
&nbsp; &nbsp; EVH_MILLISEC &nbsp; &nbsp;EVH_SAMPLE_COUNT &nbsp; =
&nbsp;EVH_EST_EVENT_COUNT &nbsp; &nbsp;EVH_EST_TOTAL_SEC &nbsp; =
&nbsp;PCT_EVT_TIME &nbsp; &nbsp; &nbsp; EVH_GRAPH &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; FIRST_SEEN &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;LAST_SEEN<br class=3D"">________________ __________________ =
___________________ ______________________ ____________________ =
_______________ _______________ ______________________ =
______________________<br class=3D"">log file sync &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &lt; 1024 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;1.8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;12.8 |# &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; &nbsp;2020-01-29 =
13:04:13 &nbsp; &nbsp;2020-01-29 13:04:13<br class=3D"">log file sync =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt; 4096 &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4.1 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;29.1 |### &nbsp; &nbsp; &nbsp; | &nbsp; =
&nbsp;2020-01-29 13:04:12 &nbsp; &nbsp;2020-01-29 13:04:12<br =
class=3D"">log file sync &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt; =
8192 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;8.2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;58.2 |###### =
&nbsp; &nbsp;| &nbsp; &nbsp;2020-01-29 13:04:20 &nbsp; &nbsp;2020-01-29 =
13:04:20<br class=3D""></font></div><div class=3D""><font =
face=3D"monospace" class=3D""><br class=3D""></font></div><div =
class=3D""><font face=3D"monospace" class=3D"">SQL&gt; @ash/event_hist =
"write" 1=3D1 &nbsp;"TIMESTAMP'2020-01-29 13:04:00'" =
"TIMESTAMP'2020-01-29 13:05:00'"<br class=3D""><br class=3D"">no rows =
selected<br class=3D""><br class=3D"">SQL&gt; @ash/event_hist "*" 1=3D1 =
&nbsp;"TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-29 =
13:05:00'"<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; EVH_EVENT &nbsp; =
&nbsp; &nbsp; EVH_MILLISEC &nbsp; &nbsp;EVH_SAMPLE_COUNT &nbsp; =
&nbsp;EVH_EST_EVENT_COUNT &nbsp; &nbsp;EVH_EST_TOTAL_SEC &nbsp; =
&nbsp;PCT_EVT_TIME &nbsp; &nbsp; &nbsp; EVH_GRAPH &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; FIRST_SEEN &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;LAST_SEEN<br class=3D"">_________________ =
__________________ ___________________ ______________________ =
____________________ _______________ _______________ =
______________________ ______________________<br class=3D"">log file =
sync &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&lt; 1024 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.8 =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.8 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;12.8 |# &nbsp; &nbsp; &nbsp; &nbsp; | =
&nbsp; &nbsp;2020-01-29 13:04:13 &nbsp; &nbsp;2020-01-29 13:04:13<br =
class=3D"">log file sync &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;&lt; 4096 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;4.1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;29.1 =
|### &nbsp; &nbsp; &nbsp; | &nbsp; &nbsp;2020-01-29 13:04:12 &nbsp; =
&nbsp;2020-01-29 13:04:12<br class=3D"">log file sync &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&lt; 8192 &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8.2 &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;58.2 |###### &nbsp; &nbsp;| &nbsp; &nbsp;2020-01-29 =
13:04:20 &nbsp; &nbsp;2020-01-29 13:04:20<br class=3D"">row cache lock =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&lt; 1 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2469.1 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2.5 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; 100 |##########| &nbsp; &nbsp;2020-01-29 =
13:04:01 &nbsp; &nbsp;2020-01-29 13:04:01<br class=3D""></font></div><div =
class=3D""><font face=3D"monospace" class=3D""><br =
class=3D""></font></div><div class=3D""><font face=3D"monospace" =
class=3D"">SQL&gt; @ashtop =
username,event2,program2,blocking_session,blocking_inst_id =
"lower(program) like '%lg%'" "TIMESTAMP'2020-01-29 13:04:00'" =
"TIMESTAMP'2020-01-29 13:05:00'"<br class=3D""><br class=3D"">no rows =
selected<br class=3D""></font></div><div class=3D""><font =
face=3D"monospace" class=3D""><br class=3D""></font></div><div =
class=3D""><font face=3D"monospace" class=3D"">i'm going to copy over =
gv$active_session_history to a single instance database and take a look =
at ash_wait_chains but i don't expect that to reveal anything =
either.</font></div><div class=3D""><font face=3D"monospace" =
class=3D""><br class=3D""></font></div><div class=3D""><font =
face=3D"monospace" class=3D"">i need to confirm whether adding redo =
resolved issue (will have developer fire up workload) but I think next =
step is to investigate x$trace and possible process communication =
issues?</font></div></div><br class=3D""><div class=3D"gmail_quote"><div =
dir=3D"ltr" class=3D"gmail_attr">On Thu, Jan 30, 2020 at 10:27 AM Chris =
Stephens &lt;<a href=3D"mailto:cstephens16@gmail.com" target=3D"_blank" =
class=3D"">cstephens16@gmail.com</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D"">Hey&nbsp;Tanel,<div class=3D""><br =
class=3D""></div><div class=3D"">Thanks for taking time to write this =
up! Will definitely&nbsp;be attending that webinar.</div><div =
class=3D""><br class=3D""></div><div class=3D"">comments =
inline</div></div><br class=3D""><div class=3D"gmail_quote"><div =
dir=3D"ltr" class=3D"gmail_attr">On Wed, Jan 29, 2020 at 10:32 PM Tanel =
Poder &lt;<a href=3D"mailto:tanel@tanelpoder.com" target=3D"_blank" =
class=3D"">tanel@tanelpoder.com</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D"">Comments:</div><div class=3D""><br class=3D""></div><div =
class=3D"">1) First, looks like you're using all the right tools for =
systematic troubleshooting =
;-)</div></div></div></div></div></div></div></div></div></blockquote><div=
 class=3D""><br class=3D""></div><div class=3D"">***i =
try!&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div class=3D""><br =
class=3D""></div><div class=3D"">2) As "log file sync" is not directly =
an I/O wait event (IO happens indirectly under the hood), but "wait for =
LGWR to ACK/increment redo persisting position" then there may be =
multiple reasons for delays. I'm repeating some of what others have said =
here too:</div><div class=3D"">&nbsp; - LGWR could be waiting for remote =
ACK (synchronous data guard) - that should show up as a LNS* wait or =
something like that though</div><div class=3D"">&nbsp; - Process =
slowness due to CPU overload/scheduling latency&nbsp;or swapping (in =
both cases, LGWR may do its work fast, but it takes a while for the =
foreground process to wake up and end the wait event - you could check =
if there are any reports regarding switching LGWR mode in alert.log =
around that time)</div><div class=3D"">&nbsp; - Bug/missed post due to =
switching between LGWR progress polling vs. waiting for LGWR =
post</div><div class=3D"">&nbsp; - ASM instance communication issues =
(ASM communication can be on the critical path when creating/changing =
files on ASM and even for reading, when the ASM metadata cache in your =
local instance doesn't have what it needs)</div><div class=3D"">&nbsp; - =
So, the physical redo log file disk write is not necessarily the biggest =
contributor to the log file sync wait</div><div class=3D""><br =
class=3D""></div></div></div></div></div></div></div></div></div></blockqu=
ote><div class=3D"">*** no data guard in the env.</div><div class=3D"">***=
 we have excess CPU. (20 cores / instance and usually only 1-5 active =
foreground processes but i will confirm from TFA collections)</div><div =
class=3D"">*** yesterday i was ready to blame bug. today, not so sure. i =
checked log file switches with a script google found me a few days ago. =
now thinking that was inaccurate as a script from metastink shows tones =
of switching so that could very well be related.</div><div =
class=3D"">***asm communication is something i need to look into. TFA =
has reported some ASM issues but there's no details and i never saw =
anything in asm alert logs indicating an issue.</div><div =
class=3D"">&nbsp;</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D""></div><div class=3D"">3) You'll need to measure what LGWR =
itself was doing during the problem time</div><div class=3D"">&nbsp; - =
Since you're on 19c (12c or later) the LGWR has slaves too and they may =
be doing most of the work&nbsp;</div><div class=3D"">&nbsp; - You'll =
need to look into both the LGWR process and any other LG* ones =
too</div><div class=3D"">&nbsp; - Since you have (Oracle) Snapper =
running already and writing to trace you should have the data =
already</div><div dir=3D"ltr" class=3D"">&nbsp; - The <b =
class=3D"">lgwr</b>&nbsp;in the following <font face=3D"monospace" =
class=3D"">@snapper ash,stats,trace 10 999999 lgwr</font> syntax =
includes LGWR slaves too<br class=3D""></div><div dir=3D"ltr" =
class=3D"">&nbsp; - As LGWR slaves have their own sessions (and SIDs), =
you should run ashtop etc and report if there's =
any&nbsp;BLOCKING_SESSION value listed for LGWR and slaves<br =
class=3D""></div><div class=3D"">&nbsp; - If LGWR and the slaves were =
listed <i class=3D"">idle</i> during the problem time (while others =
waited for log file sync event at the same time) <i class=3D"">and =
if&nbsp;</i>there was no huge OS level CPU/memory shortage at the time, =
this looks like a background &lt;-&gt; foreground communication =
problem.&nbsp;</div><div class=3D"">&nbsp; &nbsp; &nbsp;- This (and any =
potential ASM issues) could probably be troubleshooted via KST tracing, =
KSR channels and X$TRACE that could help here (some of it is enabled by =
default ( Like I showed in a previous hacking session: <a =
href=3D"https://www.youtube.com/watch?v=3DmkmvZv58W6w" target=3D"_blank" =
class=3D"">https://www.youtube.com/watch?v=3DmkmvZv58W6w</a> )</div><div =
class=3D""><br =
class=3D""></div></div></div></div></div></div></div></div></div></blockqu=
ote><div class=3D""><br class=3D""></div><div class=3D"">here is all the =
output i have corresponding to a 6 sec "log file sync" wait:</div><div =
class=3D""><br class=3D""></div><div class=3D""><table =
style=3D"vertical-align: baseline; font-size: 12px; line-height: 13.5pt; =
font-family: Lato, verdana, &quot;bitstream vera sans&quot;, =
&quot;dejavu sans&quot;, &quot;liberation sans&quot;, geneva, =
sans-serif; margin: 2ex 0px 2ex 40px; border-collapse: collapse; border: =
0px;" class=3D""><tbody style=3D"vertical-align:baseline" class=3D""><tr =
style=3D"vertical-align:baseline" class=3D""><td =
style=3D"vertical-align:baseline;padding:0.5ex 0px 0.5ex =
1em;text-align:right;font-size:10.2px;border:0px;white-space:nowrap" =
class=3D"">Experience duration:</td><td =
style=3D"vertical-align:baseline;padding:0.5ex 1em;width:100em" =
class=3D""><em style=3D"vertical-align:baseline" =
class=3D"">R</em>&nbsp;=3D&nbsp;<em style=3D"vertical-align:baseline" =
class=3D"">t</em><span style=3D"vertical-align:0px;font-size:7.2px" =
class=3D"">1</span>&nbsp;=E2=88=92&nbsp;<em =
style=3D"vertical-align:baseline" class=3D"">t</em><span =
style=3D"vertical-align:0px;font-size:7.2px" =
class=3D"">0</span>&nbsp;=3D&nbsp;<span title=3D"This report is relevant =
only if the end-user experience you are analyzing has consumed the =
duration shown here. If this duration does not correspond to your actual =
end-user performance experience, then you should repair your trace data =
or collect trace data more carefully for another experience. Time your =
data collection activation and deactivation to match precisely the begin =
and end times of the experience you are analyzing." =
style=3D"vertical-align:baseline;color:tomato;font-weight:bold" =
class=3D""><a =
href=3D"x-msg://3/#m_-8103031405908768729_m_7542945560652427325_m_-5509691=
095923674063_profile-by-subroutine" =
style=3D"vertical-align:baseline;text-decoration-line:none;color:tomato" =
class=3D"">41.336137&nbsp;seconds</a></span></td></tr><tr =
style=3D"vertical-align:baseline" class=3D""><td =
style=3D"vertical-align:baseline;padding:0.5ex 0px 0.5ex =
1em;text-align:right;font-size:10.2px;border:0px;white-space:nowrap" =
class=3D"">Experience begin time:</td><td =
style=3D"vertical-align:baseline;padding:0.5ex 1em;width:100em" =
class=3D""><em style=3D"vertical-align:baseline" class=3D"">t</em><span =
style=3D"vertical-align:0px;font-size:7.2px" class=3D"">0</span>&nbsp;=3D =
2020-01-29T13:04:07.560478</td></tr><tr style=3D"vertical-align:baseline" =
class=3D""><td style=3D"vertical-align:baseline;padding:0.5ex 0px 0.5ex =
1em;text-align:right;font-size:10.2px;border:0px;white-space:nowrap" =
class=3D"">Experience end time:</td><td =
style=3D"vertical-align:baseline;padding:0.5ex 1em;width:100em" =
class=3D""><em style=3D"vertical-align:baseline" class=3D"">t</em><span =
style=3D"vertical-align:0px;font-size:7.2px" class=3D"">1</span>&nbsp;=3D =
2020-01-29T13:04:48.896615</td></tr></tbody></table></div>CALL-NAME =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;DURATION &nbsp; &nbsp; &nbsp; % &nbsp; CALLS &nbsp; &nbsp; =
&nbsp;MEAN &nbsp; &nbsp; &nbsp; MIN &nbsp; &nbsp; &nbsp; MAX<br =
class=3D"">------------------------------ &nbsp;--------- &nbsp;------ =
&nbsp;------ &nbsp;-------- &nbsp;-------- &nbsp;--------<br =
class=3D"">SQL*Net message from client &nbsp; &nbsp; 26.017105 &nbsp; =
66.5% &nbsp; 9,886 &nbsp;0.002632 &nbsp;0.000209 &nbsp;7.858051<br =
class=3D"">log file sync &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; 10.582409 &nbsp; 27.1% &nbsp; &nbsp; 150 =
&nbsp;0.070549 &nbsp;0.000295 &nbsp;6.909711<br class=3D"">EXEC &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; 2.088757 &nbsp; &nbsp;5.3% &nbsp; 9,702 =
&nbsp;0.000215 &nbsp;0.000000 &nbsp;0.209199<br class=3D"">DLM cross =
inst call completion &nbsp; 0.109617 &nbsp; &nbsp;0.3% &nbsp; &nbsp; 212 =
&nbsp;0.000517 &nbsp;0.000005 &nbsp;0.080575<br class=3D"">row cache =
lock &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
0.070118 &nbsp; &nbsp;0.2% &nbsp; &nbsp; 125 &nbsp;0.000561 =
&nbsp;0.000208 &nbsp;0.000972<br class=3D"">FETCH &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;0.047586 &nbsp; &nbsp;0.1% &nbsp; 1,029 &nbsp;0.000046 =
&nbsp;0.000000 &nbsp;0.003056<br class=3D"">PARSE &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;0.045293 &nbsp; &nbsp;0.1% &nbsp; 4,778 &nbsp;0.000009 =
&nbsp;0.000000 &nbsp;0.002692<br class=3D"">CLOSE &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;0.033453 &nbsp; &nbsp;0.1% &nbsp; 4,784 &nbsp;0.000007 =
&nbsp;0.000000 &nbsp;0.000237<br class=3D"">library cache lock &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.023614 &nbsp; &nbsp;0.1% =
&nbsp; &nbsp; &nbsp;55 &nbsp;0.000429 &nbsp;0.000152 &nbsp;0.000901<br =
class=3D"">library cache pin &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;0.022311 &nbsp; &nbsp;0.1% &nbsp; &nbsp; &nbsp;55 =
&nbsp;0.000406 &nbsp;0.000153 &nbsp;0.000689<br class=3D"">17 others =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;0.071430 &nbsp; &nbsp;0.2% &nbsp;10,772 &nbsp;0.000007 =
&nbsp;0.000000 &nbsp;0.001487<br class=3D"">------------------------------=
 &nbsp;--------- &nbsp;------ &nbsp;------ &nbsp;-------- &nbsp;-------- =
&nbsp;--------<br class=3D""><div class=3D"">TOTAL (27) &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;39.111693 =
&nbsp;100.0% &nbsp;41,548 &nbsp;0.000941 &nbsp;0.000000 =
&nbsp;7.858051</div><div class=3D""><br class=3D""></div><div class=3D"">6=
 sec wait from raw data:</div><div class=3D""><br class=3D""></div><div =
class=3D"">*** 2020-01-29T13:04:20.724666-06:00<br class=3D"">WAIT #0: =
nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync scn=3D976089279 =
p3=3D0 obj#=3D-1 tim=3D1025228531958<br class=3D"">WAIT #0: nam=3D'SQL*Net=
 message to client' ela=3D 2 driver id=3D1413697536 #bytes=3D1 p3=3D0 =
obj#=3D-1 tim=3D1025228532118<br class=3D"">WAIT #0: nam=3D'SQL*Net =
message from client' ela=3D 1711 driver id=3D1413697536 #bytes=3D1 p3=3D0 =
obj#=3D-1 tim=3D1025228533852<br class=3D"">BINDS #140204587734864:<br =
class=3D""></div><div class=3D""><br class=3D""></div><div class=3D"">i =
enabled 10046 on lgwr:</div><div class=3D""><br class=3D""></div><div =
class=3D"">*** 2020-01-29T13:04:20.725994-06:00<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228533313<br class=3D"">WAIT #0: nam=3D'log file =
parallel write' ela=3D 279 files=3D1 blocks=3D17 requests=3D2 obj#=3D-1 =
tim=3D1025228533777<br class=3D"">WAIT #0: nam=3D'log file parallel =
write' ela=3D 171 files=3D1 blocks=3D3 requests=3D2 obj#=3D-1 =
tim=3D1025228534104<br class=3D"">WAIT #0: nam=3D'LGWR wait for redo =
copy' ela=3D 22 copy latch #=3D28 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228534182<br class=3D"">WAIT #0: nam=3D'log file parallel =
write' ela=3D 183 files=3D1 blocks=3D2 requests=3D2 obj#=3D-1 =
tim=3D1025228534437<br class=3D"">WAIT #0: nam=3D'log file parallel =
write' ela=3D 174 files=3D1 blocks=3D2 requests=3D2 obj#=3D-1 =
tim=3D1025228534721<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 31 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228534786<br =
class=3D"">WAIT #0: nam=3D'rdbms ipc message' ela=3D 3233 timeout=3D8 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228538038<br class=3D"">WAIT #0: =
nam=3D'log file parallel write' ela=3D 170 files=3D1 blocks=3D2 =
requests=3D1 obj#=3D-1 tim=3D1025228538297<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 25 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228538358<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 11829 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228550204<br =
class=3D"">WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 52 copy latch =
#=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228550294<br class=3D"">WAIT #0: =
nam=3D'log file parallel write' ela=3D 239 files=3D1 blocks=3D7 =
requests=3D1 obj#=3D-1 tim=3D1025228550624<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 23 timeout=3D6 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228550699<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 16750 timeout=3D6 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228567467<br =
class=3D"">WAIT #0: nam=3D'log file parallel write' ela=3D 207 files=3D1 =
blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228567762<br class=3D"">WAIT =
#0: nam=3D'rdbms ipc message' ela=3D 26 timeout=3D5 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228567825<br class=3D"">WAIT #0: nam=3D'rdbms ipc =
message' ela=3D 16249 timeout=3D5 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228584091<br class=3D"">WAIT #0: nam=3D'log file parallel =
write' ela=3D 213 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 =
tim=3D1025228584400<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 48 timeout=3D3 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228584502<br =
class=3D"">WAIT #0: nam=3D'rdbms ipc message' ela=3D 16466 timeout=3D3 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228600993<br class=3D"">WAIT #0: =
nam=3D'log file parallel write' ela=3D 228 files=3D1 blocks=3D9 =
requests=3D1 obj#=3D-1 tim=3D1025228601393<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 29 timeout=3D1 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228601482<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 9548 timeout=3D1 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228611053<br =
class=3D"">WAIT #0: nam=3D'rdbms ipc message' ela=3D 7467 timeout=3D300 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228618574<br class=3D"">WAIT #0: =
nam=3D'LGWR wait for redo copy' ela=3D 13 copy latch #=3D8 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228618636<br class=3D"">WAIT #0: nam=3D'log file =
parallel write' ela=3D 222 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 =
tim=3D1025228618960<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 31 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228619066<br =
class=3D"">WAIT #0: nam=3D'rdbms ipc message' ela=3D 15946 timeout=3D300 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228635045<br class=3D"">WAIT #0: =
nam=3D'log file parallel write' ela=3D 240 files=3D1 blocks=3D9 =
requests=3D1 obj#=3D-1 tim=3D1025228635427<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 33 timeout=3D298 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228635525<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 15987 timeout=3D298 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228651537<br =
class=3D"">WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1 =
blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228651912<br class=3D"">WAIT =
#0: nam=3D'rdbms ipc message' ela=3D 51 timeout=3D296 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228652034<br class=3D"">WAIT #0: nam=3D'rdbms ipc =
message' ela=3D 16332 timeout=3D296 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228668405<br class=3D"">WAIT #0: nam=3D'log file parallel =
write' ela=3D 256 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 =
tim=3D1025228668804<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 37 timeout=3D295 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228668911<br =
class=3D"">WAIT #0: nam=3D'rdbms ipc message' ela=3D 16969 timeout=3D295 =
p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228685907<br class=3D"">WAIT #0: =
nam=3D'log file parallel write' ela=3D 273 files=3D1 blocks=3D9 =
requests=3D1 obj#=3D-1 tim=3D1025228686327<br class=3D"">WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 38 timeout=3D293 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025228686453<br class=3D"">WAIT #0: nam=3D'rdbms ipc message' =
ela=3D 6908 timeout=3D293 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228693390<br =
class=3D"">WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1 =
blocks=3D2 requests=3D1 obj#=3D-1 tim=3D1025228693768<br class=3D"">WAIT =
#0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D292 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025228693862j<br class=3D""></div><div =
class=3D"">&nbsp;...more of the same</div><div class=3D"">264396 WAIT =
#0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D52 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231090742<br class=3D"">&nbsp;264397 WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 19460 timeout=3D52 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231110227<br class=3D"">&nbsp;264398 WAIT #0: =
nam=3D'log file parallel write' ela=3D 272 files=3D1 blocks=3D9 =
requests=3D1 obj#=3D-1 tim=3D1025231110633<br class=3D"">&nbsp;264399 =
WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D50 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231110748<br class=3D"">&nbsp;264400 WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 15484 timeout=3D50 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231126261<br class=3D"">&nbsp;264401 WAIT #0: =
nam=3D'log file parallel write' ela=3D 247 files=3D1 blocks=3D4 =
requests=3D1 obj#=3D-1 tim=3D1025231126641<br class=3D"">&nbsp;264402 =
WAIT #0: nam=3D'rdbms ipc message' ela=3D 36 timeout=3D49 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231126753<br class=3D"">&nbsp;264403 WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 16732 timeout=3D49 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231143514<br class=3D"">&nbsp;264404 WAIT #0: =
nam=3D'LGWR wait for redo copy' ela=3D 31 copy latch #=3D8 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231143603<br class=3D"">&nbsp;264405 WAIT #0: =
nam=3D'log file parallel write' ela=3D 279 files=3D1 blocks=3D9 =
requests=3D1 obj#=3D-1 tim=3D1025231143976<br class=3D"">&nbsp;264406 =
WAIT #0: nam=3D'rdbms ipc message' ela=3D 35 timeout=3D47 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231144093<br class=3D"">&nbsp;264407 WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 5384 timeout=3D47 p2=3D0 p3=3D0 obj#=3D-1=
 tim=3D1025231149515<br class=3D"">&nbsp;264408 WAIT #0: nam=3D'log file =
parallel write' ela=3D 255 files=3D1 blocks=3D4 requests=3D1 obj#=3D-1 =
tim=3D1025231149889<br class=3D"">&nbsp;264409 WAIT #0: nam=3D'rdbms ipc =
message' ela=3D 32 timeout=3D47 p2=3D0 p3=3D0 obj#=3D-1 =
tim=3D1025231149989<br class=3D"">&nbsp;264410<br class=3D"">&nbsp;264411 =
*** 2020-01-29T13:04:23.802774-06:00<br class=3D"">&nbsp;264412 WAIT #0: =
nam=3D'rdbms ipc message' ela=3D 460056 timeout=3D46 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231610086<br class=3D"">&nbsp;264413<br =
class=3D"">&nbsp;264414 *** 2020-01-29T13:04:26.797925-06:00<br =
class=3D"">&nbsp;264415 WAIT #0: nam=3D'rdbms ipc message' ela=3D =
2995027 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025234605238<br =
class=3D"">&nbsp;264416 WAIT #0: nam=3D'rdbms ipc message' ela=3D 10649 =
timeout=3D1 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025234616054<br =
class=3D"">&nbsp;264417<br class=3D""></div><div class=3D""><br =
class=3D""></div><div class=3D"">i will get you snapper lgwr output =
associated w/ this interval as soon as i can find it (can't remember =
where i stashed it and lost trace file name .. fml)</div><div =
class=3D""><br class=3D""></div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D""></div><div class=3D"">4) Paste the full <b =
class=3D"">Snapper</b> output (of one time snapshot from the problem =
time) here if you can - we'll be able to see what the slaves were =
doing</div><div class=3D"">&nbsp; - You'd need to include not only LGWR =
SID but also any LG*&nbsp;slave SIDs too</div><div class=3D"">&nbsp; - =
Also we'll see better LGWR delay metrics from V$SESSTAT that go beyond =
just wait events</div><div class=3D"">&nbsp; - Thinking of metrics like =
these - snapper already handles them (if you look into the rightmost =
column)</div><div class=3D""><div class=3D"">&nbsp; &nbsp; &nbsp;- <b =
class=3D"">redo synch time overhead (usec)</b> - "FG wakeup overhead per =
log file sync"</div><div class=3D"">&nbsp; &nbsp; &nbsp;- <b =
class=3D"">redo write time</b> - accounting end to end redo write time =
(not just the log file parallel write syscall) "per redo =
write"</div></div><div class=3D""><br class=3D""></div><div class=3D"">5) =
Regarding the <b class=3D"">ASH wait chains</b> error on 19c - I'm aware =
of this but forgot to look into it</div><div class=3D"">&nbsp; &nbsp;- =
It has (probably) something to do with GV$ views (or PX slaves used for =
PDB V$ queries) and possibly the connect by I'm using</div><div =
class=3D"">&nbsp; &nbsp;- Does it work in the CDB (if =
multitenant)?</div><div class=3D"">&nbsp; &nbsp;- Does it work with =
lower optimizer features enabled?</div><div class=3D"">&nbsp; &nbsp;- As =
we are already manually looking into what LGWR and its slaves are doing, =
there's no need for the wait chains script here</div><div class=3D""><br =
class=3D""></div></div></div></div></div></div></div></div></div></blockqu=
ote><div class=3D"">***oracle has filed a bug. altering =
optimizer_features_enabled doesn't help.</div><div =
class=3D"">&nbsp;</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D""></div><div class=3D"">6) Linux Process Snapper - =
pSnapper</div><div class=3D"">&nbsp; &nbsp;- If you want to see the full =
"wait" states of Linux processes/threads, use the<b class=3D""> -a =
</b>option.</div><div class=3D"">&nbsp; &nbsp;- It's somewhat like =
always showing "idle" threads too</div><div class=3D"">&nbsp; &nbsp;- =
But for linux apps it's harder to determine what's idle and what's some =
network RPC wait, so I currently just do what Linux "load" metric =
does</div><div class=3D"">&nbsp; &nbsp;- I just shows threads in =
Runnable and Disk/Uninterruptible sleep states by default (-a =
shows&nbsp;all states, including Sleep and kernel-Idle threads =
etc)</div><div class=3D"">&nbsp; &nbsp;- So the Linux pSnapper "idle" =
means a somewhat different thing than Oracle "idle" - say LGWR was =
waiting for a Data Guard ACK, that should not be shown as Idle in =
Oracle, but at the Linux process level it's "just some network socket =
wait" and you'd need to use <b class=3D"">-a</b>&nbsp;option to show all =
of =
them</div></div></div></div></div></div></div></div></div></blockquote><di=
v class=3D""><br class=3D""></div><div class=3D"">*** will add that =
option to future collections (and watch the video/read =
help)&nbsp;</div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div class=3D""><br =
class=3D""></div><div class=3D"">7) Were LGWR/LG* processes mostly idle =
during the problem time?</div><div class=3D"">&nbsp; &nbsp;- As I kept =
reading the thread, it looks like ASH reported LGWR being mostly idle =
during the problem time</div><div class=3D"">&nbsp; &nbsp;- This is not =
exactly the same pattern, but I've seen cases in the past where =
everyone's waiting for LGWR and LGWR doesn't seem to be busy and doesn't =
show any significant waits</div><div class=3D"">&nbsp; &nbsp;- It turned =
out that LGWR was actually waiting for CKPT to make progress, but it =
wasn't synchronously waiting, but just going to sleep and "checking =
again later" in a loop</div><div class=3D"">&nbsp; &nbsp;- I wrote about =
this here: <a =
href=3D"https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incom=
plete-and-lgwr-waiting-for-checkpoint-progress/" target=3D"_blank" =
class=3D"">https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-in=
complete-and-lgwr-waiting-for-checkpoint-progress/</a></div><div =
class=3D"">&nbsp; &nbsp;- So it may still make sense to check what =
LGWR/LG* processes are doing even if they don't seem to be active much =
(like do you see an occasional controlfile read wait by LGWR =
etc)</div><div class=3D"">&nbsp; &nbsp;- SQL*Trace would capture all =
activity of LGWR, but may slow things down further, so you could also =
sample &amp; spool V$SESSION_WAIT_HISTORY (that keeps a log of last 10 =
waits of every session in memory)</div><div class=3D""><br =
class=3D""></div></div></div></div></div></div></div></div></div></blockqu=
ote><div class=3D"">*** i think i've addressed most of this.</div><div =
class=3D"">&nbsp;</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D""></div><div class=3D"">Incidentally I'm doing another hacking =
session tomorrow (Thursday 30 Jan @ 12pm ET) about complex Oracle =
performance problems that span multiple layers and have multiple moving =
parts. I can add some LGWR troubleshooting magic in the end of that =
(here goes my plan to fit the session in 1.5h again!). I'll post the =
details in my blog in a few minutes.</div><div class=3D""><br =
class=3D""></div></div></div></div></div></div></div></div></div></blockqu=
ote><div class=3D"">*** can't wait!!&nbsp;</div><blockquote =
class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" =
class=3D""><div dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
dir=3D"ltr" class=3D""><div dir=3D"ltr" class=3D""><div =
class=3D""></div><div class=3D"">--</div><div class=3D"">Tanel =
Poder</div><div class=3D""><a href=3D"https://blog.tanelpoder.com/seminar"=
 target=3D"_blank" =
class=3D"">https://blog.tanelpoder.com/seminar</a></div><div =
class=3D""><br class=3D""></div></div></div></div></div></div></div><br =
class=3D""><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Tue, Jan 28, 2020 at 2:22 PM Chris Stephens =
&lt;<a href=3D"mailto:cstephens16@gmail.com" target=3D"_blank" =
class=3D"">cstephens16@gmail.com</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D""><div =
class=3D""><font face=3D"monospace" class=3D"">doesn't look like any ash =
data for lgwr:</font></div><div class=3D""><font face=3D"monospace" =
class=3D""><br class=3D""></font></div><font face=3D"monospace" =
class=3D"">SQL&gt; @ashtop session_id,blocking_session,event2 1=3D1 =
"TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"<br =
class=3D"">&nbsp; &nbsp;TOTALSECONDS &nbsp; &nbsp;AAS &nbsp; &nbsp; =
&nbsp;%This &nbsp; &nbsp;SESSION_ID &nbsp; &nbsp;BLOCKING_SESSION &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;EVENT2 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; FIRST_SEEN &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;LAST_SEEN &nbsp; &nbsp;DIST_SQLEXEC_SEEN<br =
class=3D"">_______________ ______ __________ _____________ =
___________________ _____________________ ______________________ =
______________________ ____________________<br class=3D"">&nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;81 &nbsp; &nbsp;0.7 &nbsp; 82% | =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1713 &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1710 log file sync &nbsp; &nbsp; =
&nbsp; &nbsp; 2020-01-27 15:50:54 &nbsp; &nbsp;2020-01-27 15:52:14 =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; 1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
5 &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;5% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; 1221 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;2020-01-27 15:50:59 &nbsp; &nbsp;2020-01-27 15:52:14 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3 =
&nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;3% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;246 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;2020-01-27 15:51:00 &nbsp; &nbsp;2020-01-27 15:52:00 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3 =
&nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;3% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; 1713 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;2020-01-27 15:50:51 &nbsp; &nbsp;2020-01-27 15:52:27 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
2<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 =
&nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;1% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;123 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;2020-01-27 15:51:35 &nbsp; &nbsp;2020-01-27 15:51:35 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 =
&nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;1% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;127 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;2020-01-27 15:51:50 &nbsp; &nbsp;2020-01-27 15:51:50 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 =
&nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;1% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;252 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;2321 latch free &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;2020-01-27 15:52:28 &nbsp; &nbsp;2020-01-27 15:52:28 &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1<br =
class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; =
&nbsp; &nbsp;0 &nbsp; &nbsp;1% | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;978 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; ges remote message &nbsp; &nbsp;2020-01-27 15:50:48 &nbsp; =
&nbsp;2020-01-27 15:50:48 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1<br class=3D"">&nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;1% | =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;983 &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; latch free &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2020-01-27 15:52:28 &nbsp; =
&nbsp;2020-01-27 15:52:28 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1<br class=3D"">&nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;1% | =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1713 &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; library cache lock =
&nbsp; &nbsp;2020-01-27 15:50:48 &nbsp; &nbsp;2020-01-27 15:50:48 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1<br class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 =
&nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp;1% | &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; 1831 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; ON CPU &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp;2020-01-27 15:50:49 &nbsp; &nbsp;2020-01-27 15:50:49 &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
1<br class=3D""><br class=3D""><br class=3D"">11 rows selected.<br =
class=3D""><br class=3D""><br class=3D"">SQL&gt; @ashtop =
session_id,blocking_session,event2 session_id=3D1710 =
"TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"<br =
class=3D""><br class=3D"">no rows selected</font><br class=3D""></div><br =
class=3D""><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Tue, Jan 28, 2020 at 1:09 PM Noveljic Nenad =
&lt;<a href=3D"mailto:nenad.noveljic@vontobel.com" target=3D"_blank" =
class=3D"">nenad.noveljic@vontobel.com</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex">




<div class=3D"">
You=E2=80=99re using v0.2, but &nbsp;v0.6 should be the latest version. =
0.2 wouldn=E2=80=99t show idle blockers.
<div class=3D""><br class=3D"">
</div>
<div class=3D"">I looked at the ashtop output once again. Strange that =
lgwr processes nor any other bg processes show up significantly there. =
The question is, what was lgwr doing while the fg processes were waiting =
on him.<br class=3D"">
<br class=3D""></div>

</div>

</blockquote></div>
</blockquote></div></div></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div></div>
</div></blockquote></div><br class=3D""></div></body></html>=

--Apple-Mail=_68151583-789F-4A89-9EA9-0126EF206DEE--
--
http://www.freelists.org/webpage/oracle-l


