Return-Path: <oracle-l-bounce@freelists.org>
X-Original-To: oracle-l@orafaq.com
Delivered-To: oracle-l@orafaq.com
Received: from turing.freelists.org (turing.freelists.org [206.53.239.180])
 by malta2546.startdedicated.com (Postfix) with ESMTPS id 06C47100333EB8
 for <oracle-l@orafaq.com>; Sun,  2 Feb 2020 23:11:00 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 78C39248B6;
 Sun,  2 Feb 2020 17:10:58 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freelists.org;
 s=turing; t=1580681458;
 bh=cizB1Xa+Dnp2Zlhk9ofimuiwiBe7TLHaGHrZgwy2iUE=;
 h=From:Sender:Sender:From;
 b=q5Hi+EdWhUZ+p4iW5HUr+6OcHtXHQEODYxvdWqC9nRyYXCt5ukNkIroeog2QRSUSl
	 a2mMdEINJJUhGKH64Ff9nUKitnd1CxxczhpycMbcXv9XiH77uDhsGfVzfv3ieM66vc
	 0l9PpaKG2sTZDOQ+qX1XPZQU2KGopDHKtXCRl580=
X-Virus-Scanned: Debian amavisd-new at turing.freelists.org
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id K74xRYPfSaS9; Sun,  2 Feb 2020 17:10:57 -0500 (EST)
Received: from turing.freelists.org (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 25E9320D8B;
 Sun,  2 Feb 2020 17:10:10 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freelists.org;
 s=turing; t=1580681454;
 bh=cizB1Xa+Dnp2Zlhk9ofimuiwiBe7TLHaGHrZgwy2iUE=;
 h=From:Sender:Sender:From;
 b=iCSddoKdw3Z5l+Sb/f6Tb1dj5KFGx5gKlqWwE302oKIuERRQTVygE5t9h1vxevKxV
	 2XiT+HpUaRKDhAQqwPgfXk9sx8oPMUT+08EBqHxqLlOntGSC/QWKANRAOP+Q+W4Ncd
	 /x6TwHqPcL7r/kg2dYPEhz+Nq7Eo464djMp3C4ww=
Received: with ECARTIS (v1.0.0; list oracle-l); Sun, 02 Feb 2020 17:09:25 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id C34CD207D4
 for <Oracle-L@freelists.org>; Sun,  2 Feb 2020 17:09:24 -0500 (EST)
Authentication-Results: turing.freelists.org;
 dkim=pass (2048-bit key; unprotected) header.d=tanelpoder-com.20150623.gappssmtp.com header.i=@tanelpoder-com.20150623.gappssmtp.com header.b="WYU6uMal";
 dkim-atps=neutral
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id Mes_iGf32sZE for <Oracle-L@freelists.org>;
 Sun,  2 Feb 2020 17:09:24 -0500 (EST)
Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id F038F20461
 for <Oracle-L@freelists.org>; Sun,  2 Feb 2020 17:09:23 -0500 (EST)
Received: by mail-lf1-f48.google.com with SMTP id n25so8382368lfl.0
        for <Oracle-L@freelists.org>; Sun, 02 Feb 2020 14:09:23 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=mFnv+tHxXTdunuYFm6gZR4hta48sQ7mR5gZTsoURyXs=;
        b=Jq5M7ihl1UYUQxm+E3J6yIS88krKW+K22HouVjop4S4HO/Ljy78w1Szzlm3i779+8H
         sFRg3Bod5KaEiAUvyeoExy7UUJJsJrjiAr15WW5HfDffwfq3AFLJ9BtImDRwPMLlfL4Z
         n8KCfqB1eyf//1pDPLgCGMrGREZXOo3mzF/FfA6VxgNfe7IiwiiE9MflKvmhO6Kp4yeO
         g1GvZyBxzTg5MLv3opOEfNKuxdr7d9UmfBEkPROJdDN3AAAr/Y3KNP9Win1sf++UUU/H
         +G6UG658edBLD+tHh00FdOKmmEx9kM+VHVWVkyOQa1N0k+Ru+X1KOp656cP+KzMsNQXQ
         K2EQ==
X-Gm-Message-State: APjAAAWd70EeEgtCvAJPcyF+jdvOPJDJAbf7BUW1wzXtuyTDOrVjmggJ
 +NJ1CbX+3NWvkLouIlqOLI693Ba9jcKF4+NaAS3nmA==
X-Google-Smtp-Source: APXvYqzvNoMuwmvZz96VuMNQk0YQwoQLmQXipw5JIC9GQHj4Z+W7kCI4MPzqXb/k3JOTKza+fpPOOha5O39lrZXiGN4=
X-Received: by 2002:ac2:5196:: with SMTP id u22mr9798862lfi.123.1580681362337;
 Sun, 02 Feb 2020 14:09:22 -0800 (PST)
MIME-Version: 1.0
References: <CAEFL0szEBH0P-hqb39Pmjw-pdJ60OZ+RUuFoqCXsC7+=iPEFzQ@mail.gmail.com>
 <38763_1580228753_5E306091_38763_12713_1_cf89e1868e2e40949f6177bf1c604696@vontobel.com>
 <CAEFL0swS2CGeDF2pGWHDBQkO8J1cH7cMbDCxBngx613UW52A+Q@mail.gmail.com>
 <39760_1580232621_5E306FAD_39760_6519_1_238f704e660247f78d5e2f6a5c98b27f@vontobel.com>
 <CAEFL0szP5+-ZUog8Q=P0TiusLw6MpDDM46hZM=7ps0v0o3p_OQ@mail.gmail.com>
 <38763_1580238546_5E3086D2_38763_18592_1_8d9cba2aa89d4aba82f0b60173245848@vontobel.com>
 <CAEFL0szn17WU9kSj7ipRhieXd4w_3hzhU5gSmxDxhx1=WnqDkw@mail.gmail.com>
 <CAMHX9J+FooJWWr896aFx7oqeMk=PKA__GaHjxjU_gsOyE1Tzjg@mail.gmail.com>
 <CAEFL0sx7fAF89Qike6PS2Jy9-N5S-w-r31XQ7kj8AEGP5FonTg@mail.gmail.com>
 <CAEFL0syR4Xy2fPiL4M3jtmmfj_02MmYgC_haCM=LDZ17ikuGgg@mail.gmail.com>
 <CAMHX9JLchyEEvh4mHH8hHXEa53r5FP4Ck-kpCMLUiTgOJzYuEw@mail.gmail.com> <CAEFL0szoZkns_vDHZBmX22BVCznDwG3LN+bLUQY3EuqrvoQqhw@mail.gmail.com>
In-Reply-To: <CAEFL0szoZkns_vDHZBmX22BVCznDwG3LN+bLUQY3EuqrvoQqhw@mail.gmail.com>
From: Tanel Poder <tanel@tanelpoder.com>
Date: Sun, 2 Feb 2020 17:09:10 -0500
Message-ID: <CAMHX9JKBmpR3JBOL21hFPGkL5J6bzVh86ucCkbAdTQNcNxCcUA@mail.gmail.com>
Subject: Re: intermittent long "log file sync" waits
To: Chris Stephens <cstephens16@gmail.com>
Cc: Tanel Poder <tanel@tanelpoder.com>, Noveljic Nenad <nenad.noveljic@vontobel.com>,
 oracle-l <Oracle-L@freelists.org>
Content-Type: multipart/alternative; boundary="0000000000001b00b1059d9f0f0f"
X-archive-position: 76138
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: tanel@tanelpoder.com
Precedence: normal
Reply-To: tanel@tanelpoder.com
List-Help: <mailto:ecartis@freelists.org?Subject=help>
List-Unsubscribe: <mailto:oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-Subscribe: <mailto:oracle-l-request@freelists.org?Subject=subscribe>
List-Owner: <mailto:>
List-post: <mailto:oracle-l@freelists.org>
List-Archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l
--0000000000001b00b1059d9f0f0f
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Btw, adaptive log file sync is a different thing than the adaptive switch
between LGWR-only processing vs. LGWR+LG% slave processing (that's why I
mentioned this parameter:  _use_single_log_writer=3Dadaptive)


On Sun, Feb 2, 2020 at 10:20 AM Chris Stephens <cstephens16@gmail.com>
wrote:

> SQL> @ashtopLFS username,event2,program2,blocking_session,blocking_inst_i=
d
> "program like '%(LG%)'" "TIMESTAMP'2020-01-29 13:04:00'"
> "TIMESTAMP'2020-01-29 13:05:00'"
> no rows selected
> SQL>
>
> i pulled over all the tables/views referenced in ashtop.sql and
> ash_wait_chains.sql to a single instance database. Hence, "ashtopLFS.sql"
>
> From the beginning we noticed slow writes to local disc (>40ms at times).
> On Friday, as I was running out of ideas, the other DBA on the project
> suggested we create a ramdisk for /tmp as we noticed lots of writes to
> /tmp/.oracle. After doing that and rebooting all nodes, we are now unable
> to reproduce the issue.
>
> I'd already turned off adaptive log file syn:
>
> SQL> @pd "_use_adaptive"
> Show all parameters and session values from x$ksppi/x$ksppcv...
>
>        NUM N_HEX NAME
> VALUE                          DESCRIPTION
> ---------- ----- --------------------------------------------------------
> ------------------------------
> -------------------------------------------------------------------------=
--------------------------
>       2049   801 _use_adaptive_log_file_sync              FALSE
>           Adaptively switch between post/wait and polling
>
> SQL>
>
> On Monday (or Tuesday if I end up taking tomorrow off), we plan to remove
> the ramdisk and see if the issue reproduces again.
>
>
>
> On Sat, Feb 1, 2020 at 9:34 PM Tanel Poder <tanel@tanelpoder.com> wrote:
>
>> Hi Chris,
>>
>> As 12c+ has log writer slaves and Oracle can adaptavely switch from
>> single LGWR to LGWR + LGxx slave (_use_single_log_writer=3Dadaptive), ca=
n you
>> check ASH data for the LGxx slaves at this time too. Not all
>> process-to-process interactions/waits are instrumented with a
>> blocking_session in Oracle...
>>
>> Should be this command:
>>
>> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>> "program like '%(LG%)'" "TIMESTAMP'2020-01-29 13:04:00'"
>> "TIMESTAMP'2020-01-29 13:05:00'"
>>
>> --
>> Tanel Poder
>> https://blog.tanelpoder.com/seminar
>>
>>
>> On Thu, Jan 30, 2020 at 3:44 PM Chris Stephens <cstephens16@gmail.com>
>> wrote:
>>
>>> Here's some additional ash output from the trace data posted in last
>>> message including event_hist.sql:
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> 1=3D1 "TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-29 13:04:27'"
>>>    TOTALSECONDS    AAS      %This         USERNAME           EVENT2
>>>  PROGRAM2    BLOCKING_SESSION    BLOCKING_INST_ID             FIRST_SEE=
N
>>>            LAST_SEEN    DIST_SQLEXEC_SEEN
>>> _______________ ______ __________ ________________ ________________
>>> _____________ ___________________ ___________________
>>> ______________________ ______________________ ____________________
>>>               2    0.3   50% |    SYS              ON CPU
>>> (sqlplus)                                             2020-01-29 13:04:=
24
>>>  2020-01-29 13:04:25                       1
>>>               1    0.1   25% |    GEN3_MGOWER_3    ON CPU
>>> (python)                                              2020-01-29 13:04:=
22
>>>  2020-01-29 13:04:22                       1
>>>               1    0.1   25% |    GEN3_MGOWER_3    log file sync
>>>  (python)                     1710                   2 2020-01-29 13:04=
:20
>>>    2020-01-29 13:04:20                       1
>>>
>>>
>>> SQL> @bg lgwr
>>>    NAME    DESCRIPTION     SID    OPID     SPID               PADDR
>>>           SADDR
>>> _______ ______________ _______ _______ ________ ___________________
>>> ___________________
>>> LGWR    Redo etc.         1710      34 26552    00000001E8718860
>>>  00000001D8BFF4A0
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:20'" "TIMESTAMP'2020-01-2=
9
>>> 13:04:27'"
>>>
>>> no rows selected
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> session_id=3D1710 "TIMESTAMP'2020-01-29 13:04:00'" "TIMESTAMP'2020-01-2=
9
>>> 13:05:00'"
>>>
>>> no rows selected
>>>
>>> SQL> @ash/event_hist "log file" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>        EVH_EVENT       EVH_MILLISEC    EVH_SAMPLE_COUNT
>>>  EVH_EST_EVENT_COUNT    EVH_EST_TOTAL_SEC    PCT_EVT_TIME       EVH_GRA=
PH
>>>           FIRST_SEEN              LAST_SEEN
>>> ________________ __________________ ___________________
>>> ______________________ ____________________ _______________ ___________=
____
>>> ______________________ ______________________
>>> log file sync             < 1024                      1
>>>    1.8                  1.8            12.8 |#         |    2020-01-29
>>> 13:04:13    2020-01-29 13:04:13
>>> log file sync             < 4096                      1
>>>      1                  4.1            29.1 |###       |    2020-01-29
>>> 13:04:12    2020-01-29 13:04:12
>>> log file sync             < 8192                      1
>>>      1                  8.2            58.2 |######    |    2020-01-29
>>> 13:04:20    2020-01-29 13:04:20
>>>
>>> SQL> @ash/event_hist "write" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>
>>> no rows selected
>>>
>>> SQL> @ash/event_hist "*" 1=3D1  "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>         EVH_EVENT       EVH_MILLISEC    EVH_SAMPLE_COUNT
>>>  EVH_EST_EVENT_COUNT    EVH_EST_TOTAL_SEC    PCT_EVT_TIME       EVH_GRA=
PH
>>>           FIRST_SEEN              LAST_SEEN
>>> _________________ __________________ ___________________
>>> ______________________ ____________________ _______________ ___________=
____
>>> ______________________ ______________________
>>> log file sync              < 1024                      1
>>>    1.8                  1.8            12.8 |#         |    2020-01-29
>>> 13:04:13    2020-01-29 13:04:13
>>> log file sync              < 4096                      1
>>>      1                  4.1            29.1 |###       |    2020-01-29
>>> 13:04:12    2020-01-29 13:04:12
>>> log file sync              < 8192                      1
>>>      1                  8.2            58.2 |######    |    2020-01-29
>>> 13:04:20    2020-01-29 13:04:20
>>> row cache lock                < 1                      1
>>> 2469.1                  2.5             100 |##########|    2020-01-29
>>> 13:04:01    2020-01-29 13:04:01
>>>
>>> SQL> @ashtop username,event2,program2,blocking_session,blocking_inst_id
>>> "lower(program) like '%lg%'" "TIMESTAMP'2020-01-29 13:04:00'"
>>> "TIMESTAMP'2020-01-29 13:05:00'"
>>>
>>> no rows selected
>>>
>>> i'm going to copy over gv$active_session_history to a single instance
>>> database and take a look at ash_wait_chains but i don't expect that to
>>> reveal anything either.
>>>
>>> i need to confirm whether adding redo resolved issue (will have
>>> developer fire up workload) but I think next step is to investigate x$t=
race
>>> and possible process communication issues?
>>>
>>> On Thu, Jan 30, 2020 at 10:27 AM Chris Stephens <cstephens16@gmail.com>
>>> wrote:
>>>
>>>> Hey Tanel,
>>>>
>>>> Thanks for taking time to write this up! Will definitely be attending
>>>> that webinar.
>>>>
>>>> comments inline
>>>>
>>>> On Wed, Jan 29, 2020 at 10:32 PM Tanel Poder <tanel@tanelpoder.com>
>>>> wrote:
>>>>
>>>>> Comments:
>>>>>
>>>>> 1) First, looks like you're using all the right tools for systematic
>>>>> troubleshooting ;-)
>>>>>
>>>>
>>>> ***i try!
>>>>
>>>>>
>>>>> 2) As "log file sync" is not directly an I/O wait event (IO happens
>>>>> indirectly under the hood), but "wait for LGWR to ACK/increment redo
>>>>> persisting position" then there may be multiple reasons for delays. I=
'm
>>>>> repeating some of what others have said here too:
>>>>>   - LGWR could be waiting for remote ACK (synchronous data guard) -
>>>>> that should show up as a LNS* wait or something like that though
>>>>>   - Process slowness due to CPU overload/scheduling latency or
>>>>> swapping (in both cases, LGWR may do its work fast, but it takes a wh=
ile
>>>>> for the foreground process to wake up and end the wait event - you co=
uld
>>>>> check if there are any reports regarding switching LGWR mode in alert=
.log
>>>>> around that time)
>>>>>   - Bug/missed post due to switching between LGWR progress polling vs=
.
>>>>> waiting for LGWR post
>>>>>   - ASM instance communication issues (ASM communication can be on th=
e
>>>>> critical path when creating/changing files on ASM and even for readin=
g,
>>>>> when the ASM metadata cache in your local instance doesn't have what =
it
>>>>> needs)
>>>>>   - So, the physical redo log file disk write is not necessarily the
>>>>> biggest contributor to the log file sync wait
>>>>>
>>>>> *** no data guard in the env.
>>>> *** we have excess CPU. (20 cores / instance and usually only 1-5
>>>> active foreground processes but i will confirm from TFA collections)
>>>> *** yesterday i was ready to blame bug. today, not so sure. i checked
>>>> log file switches with a script google found me a few days ago. now
>>>> thinking that was inaccurate as a script from metastink shows tones of
>>>> switching so that could very well be related.
>>>> ***asm communication is something i need to look into. TFA has reporte=
d
>>>> some ASM issues but there's no details and i never saw anything in asm
>>>> alert logs indicating an issue.
>>>>
>>>>
>>>>> 3) You'll need to measure what LGWR itself was doing during the
>>>>> problem time
>>>>>   - Since you're on 19c (12c or later) the LGWR has slaves too and
>>>>> they may be doing most of the work
>>>>>   - You'll need to look into both the LGWR process and any other LG*
>>>>> ones too
>>>>>   - Since you have (Oracle) Snapper running already and writing to
>>>>> trace you should have the data already
>>>>>   - The *lgwr* in the following @snapper ash,stats,trace 10 999999
>>>>> lgwr syntax includes LGWR slaves too
>>>>>   - As LGWR slaves have their own sessions (and SIDs), you should run
>>>>> ashtop etc and report if there's any BLOCKING_SESSION value listed fo=
r LGWR
>>>>> and slaves
>>>>>   - If LGWR and the slaves were listed *idle* during the problem time
>>>>> (while others waited for log file sync event at the same time) *and
>>>>> if *there was no huge OS level CPU/memory shortage at the time, this
>>>>> looks like a background <-> foreground communication problem.
>>>>>      - This (and any potential ASM issues) could probably be
>>>>> troubleshooted via KST tracing, KSR channels and X$TRACE that could h=
elp
>>>>> here (some of it is enabled by default ( Like I showed in a previous
>>>>> hacking session: https://www.youtube.com/watch?v=3DmkmvZv58W6w )
>>>>>
>>>>>
>>>> here is all the output i have corresponding to a 6 sec "log file sync"
>>>> wait:
>>>>
>>>> Experience duration: *R* =3D *t*1 =E2=88=92 *t*0 =3D 41.336137 seconds
>>>> <#m_7864118618086203961_m_-5505970199061174881_m_7542945560652427325_m=
_-5509691095923674063_profile-by-subroutine>
>>>> Experience begin time: *t*0 =3D 2020-01-29T13:04:07.560478
>>>> Experience end time: *t*1 =3D 2020-01-29T13:04:48.896615
>>>> CALL-NAME                        DURATION       %   CALLS      MEAN
>>>>   MIN       MAX
>>>> ------------------------------  ---------  ------  ------  --------
>>>>  --------  --------
>>>> SQL*Net message from client     26.017105   66.5%   9,886  0.002632
>>>>  0.000209  7.858051
>>>> log file sync                   10.582409   27.1%     150  0.070549
>>>>  0.000295  6.909711
>>>> EXEC                             2.088757    5.3%   9,702  0.000215
>>>>  0.000000  0.209199
>>>> DLM cross inst call completion   0.109617    0.3%     212  0.000517
>>>>  0.000005  0.080575
>>>> row cache lock                   0.070118    0.2%     125  0.000561
>>>>  0.000208  0.000972
>>>> FETCH                            0.047586    0.1%   1,029  0.000046
>>>>  0.000000  0.003056
>>>> PARSE                            0.045293    0.1%   4,778  0.000009
>>>>  0.000000  0.002692
>>>> CLOSE                            0.033453    0.1%   4,784  0.000007
>>>>  0.000000  0.000237
>>>> library cache lock               0.023614    0.1%      55  0.000429
>>>>  0.000152  0.000901
>>>> library cache pin                0.022311    0.1%      55  0.000406
>>>>  0.000153  0.000689
>>>> 17 others                        0.071430    0.2%  10,772  0.000007
>>>>  0.000000  0.001487
>>>> ------------------------------  ---------  ------  ------  --------
>>>>  --------  --------
>>>> TOTAL (27)                      39.111693  100.0%  41,548  0.000941
>>>>  0.000000  7.858051
>>>>
>>>> 6 sec wait from raw data:
>>>>
>>>> *** 2020-01-29T13:04:20.724666-06:00
>>>> WAIT #0: nam=3D'log file sync' ela=3D 6909711 buffer#=3D58996 sync
>>>> scn=3D976089279 p3=3D0 obj#=3D-1 tim=3D1025228531958
>>>> WAIT #0: nam=3D'SQL*Net message to client' ela=3D 2 driver id=3D141369=
7536
>>>> #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228532118
>>>> WAIT #0: nam=3D'SQL*Net message from client' ela=3D 1711 driver
>>>> id=3D1413697536 #bytes=3D1 p3=3D0 obj#=3D-1 tim=3D1025228533852
>>>> BINDS #140204587734864:
>>>>
>>>> i enabled 10046 on lgwr:
>>>>
>>>> *** 2020-01-29T13:04:20.725994-06:00
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 2918443 timeout=3D300 p2=3D0=
 p3=3D0
>>>> obj#=3D-1 tim=3D1025228533313
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 279 files=3D1 blocks=
=3D17
>>>> requests=3D2 obj#=3D-1 tim=3D1025228533777
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 171 files=3D1 blocks=
=3D3
>>>> requests=3D2 obj#=3D-1 tim=3D1025228534104
>>>> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 22 copy latch #=3D28 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025228534182
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 183 files=3D1 blocks=
=3D2
>>>> requests=3D2 obj#=3D-1 tim=3D1025228534437
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 174 files=3D1 blocks=
=3D2
>>>> requests=3D2 obj#=3D-1 tim=3D1025228534721
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 31 timeout=3D8 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228534786
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 3233 timeout=3D8 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228538038
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 170 files=3D1 blocks=
=3D2
>>>> requests=3D1 obj#=3D-1 tim=3D1025228538297
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 25 timeout=3D8 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228538358
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 11829 timeout=3D8 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228550204
>>>> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 52 copy latch #=3D8 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025228550294
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 239 files=3D1 blocks=
=3D7
>>>> requests=3D1 obj#=3D-1 tim=3D1025228550624
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 23 timeout=3D6 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228550699
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16750 timeout=3D6 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228567467
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 207 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228567762
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 26 timeout=3D5 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228567825
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16249 timeout=3D5 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228584091
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 213 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228584400
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 48 timeout=3D3 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228584502
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16466 timeout=3D3 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228600993
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 228 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228601393
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 29 timeout=3D1 p2=3D0 p3=3D0=
 obj#=3D-1
>>>> tim=3D1025228601482
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 9548 timeout=3D1 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228611053
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 7467 timeout=3D300 p2=3D0 p3=
=3D0
>>>> obj#=3D-1 tim=3D1025228618574
>>>> WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 13 copy latch #=3D8 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025228618636
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 222 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228618960
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 31 timeout=3D300 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228619066
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 15946 timeout=3D300 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228635045
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 240 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228635427
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 33 timeout=3D298 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228635525
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 15987 timeout=3D298 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228651537
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228651912
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 51 timeout=3D296 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228652034
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16332 timeout=3D296 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228668405
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 256 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228668804
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D295 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228668911
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 16969 timeout=3D295 p2=3D0 p=
3=3D0
>>>> obj#=3D-1 tim=3D1025228685907
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 273 files=3D1 blocks=
=3D9
>>>> requests=3D1 obj#=3D-1 tim=3D1025228686327
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D293 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228686453
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 6908 timeout=3D293 p2=3D0 p3=
=3D0
>>>> obj#=3D-1 tim=3D1025228693390
>>>> WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1 blocks=
=3D2
>>>> requests=3D1 obj#=3D-1 tim=3D1025228693768
>>>> WAIT #0: nam=3D'rdbms ipc message' ela=3D 38 timeout=3D292 p2=3D0 p3=
=3D0 obj#=3D-1
>>>> tim=3D1025228693862j
>>>>  ...more of the same
>>>> 264396 WAIT #0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D52 p2=3D=
0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231090742
>>>>  264397 WAIT #0: nam=3D'rdbms ipc message' ela=3D 19460 timeout=3D52 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231110227
>>>>  264398 WAIT #0: nam=3D'log file parallel write' ela=3D 272 files=3D1
>>>> blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231110633
>>>>  264399 WAIT #0: nam=3D'rdbms ipc message' ela=3D 37 timeout=3D50 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231110748
>>>>  264400 WAIT #0: nam=3D'rdbms ipc message' ela=3D 15484 timeout=3D50 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231126261
>>>>  264401 WAIT #0: nam=3D'log file parallel write' ela=3D 247 files=3D1
>>>> blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231126641
>>>>  264402 WAIT #0: nam=3D'rdbms ipc message' ela=3D 36 timeout=3D49 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231126753
>>>>  264403 WAIT #0: nam=3D'rdbms ipc message' ela=3D 16732 timeout=3D49 p=
2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231143514
>>>>  264404 WAIT #0: nam=3D'LGWR wait for redo copy' ela=3D 31 copy latch =
#=3D8
>>>> p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231143603
>>>>  264405 WAIT #0: nam=3D'log file parallel write' ela=3D 279 files=3D1
>>>> blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231143976
>>>>  264406 WAIT #0: nam=3D'rdbms ipc message' ela=3D 35 timeout=3D47 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231144093
>>>>  264407 WAIT #0: nam=3D'rdbms ipc message' ela=3D 5384 timeout=3D47 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231149515
>>>>  264408 WAIT #0: nam=3D'log file parallel write' ela=3D 255 files=3D1
>>>> blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231149889
>>>>  264409 WAIT #0: nam=3D'rdbms ipc message' ela=3D 32 timeout=3D47 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025231149989
>>>>  264410
>>>>  264411 *** 2020-01-29T13:04:23.802774-06:00
>>>>  264412 WAIT #0: nam=3D'rdbms ipc message' ela=3D 460056 timeout=3D46 =
p2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025231610086
>>>>  264413
>>>>  264414 *** 2020-01-29T13:04:26.797925-06:00
>>>>  264415 WAIT #0: nam=3D'rdbms ipc message' ela=3D 2995027 timeout=3D30=
0 p2=3D0
>>>> p3=3D0 obj#=3D-1 tim=3D1025234605238
>>>>  264416 WAIT #0: nam=3D'rdbms ipc message' ela=3D 10649 timeout=3D1 p2=
=3D0 p3=3D0
>>>> obj#=3D-1 tim=3D1025234616054
>>>>  264417
>>>>
>>>> i will get you snapper lgwr output associated w/ this interval as soon
>>>> as i can find it (can't remember where i stashed it and lost trace fil=
e
>>>> name .. fml)
>>>>
>>>> 4) Paste the full *Snapper* output (of one time snapshot from the
>>>>> problem time) here if you can - we'll be able to see what the slaves =
were
>>>>> doing
>>>>>   - You'd need to include not only LGWR SID but also any LG* slave
>>>>> SIDs too
>>>>>   - Also we'll see better LGWR delay metrics from V$SESSTAT that go
>>>>> beyond just wait events
>>>>>   - Thinking of metrics like these - snapper already handles them (if
>>>>> you look into the rightmost column)
>>>>>      - *redo synch time overhead (usec)* - "FG wakeup overhead per
>>>>> log file sync"
>>>>>      - *redo write time* - accounting end to end redo write time (not
>>>>> just the log file parallel write syscall) "per redo write"
>>>>>
>>>>> 5) Regarding the *ASH wait chains* error on 19c - I'm aware of this
>>>>> but forgot to look into it
>>>>>    - It has (probably) something to do with GV$ views (or PX slaves
>>>>> used for PDB V$ queries) and possibly the connect by I'm using
>>>>>    - Does it work in the CDB (if multitenant)?
>>>>>    - Does it work with lower optimizer features enabled?
>>>>>    - As we are already manually looking into what LGWR and its slaves
>>>>> are doing, there's no need for the wait chains script here
>>>>>
>>>>> ***oracle has filed a bug. altering optimizer_features_enabled doesn'=
t
>>>> help.
>>>>
>>>>
>>>>> 6) Linux Process Snapper - pSnapper
>>>>>    - If you want to see the full "wait" states of Linux
>>>>> processes/threads, use the* -a *option.
>>>>>    - It's somewhat like always showing "idle" threads too
>>>>>    - But for linux apps it's harder to determine what's idle and
>>>>> what's some network RPC wait, so I currently just do what Linux "load=
"
>>>>> metric does
>>>>>    - I just shows threads in Runnable and Disk/Uninterruptible sleep
>>>>> states by default (-a shows all states, including Sleep and kernel-Id=
le
>>>>> threads etc)
>>>>>    - So the Linux pSnapper "idle" means a somewhat different thing
>>>>> than Oracle "idle" - say LGWR was waiting for a Data Guard ACK, that =
should
>>>>> not be shown as Idle in Oracle, but at the Linux process level it's "=
just
>>>>> some network socket wait" and you'd need to use *-a* option to show
>>>>> all of them
>>>>>
>>>>
>>>> *** will add that option to future collections (and watch the
>>>> video/read help)
>>>>
>>>>>
>>>>> 7) Were LGWR/LG* processes mostly idle during the problem time?
>>>>>    - As I kept reading the thread, it looks like ASH reported LGWR
>>>>> being mostly idle during the problem time
>>>>>    - This is not exactly the same pattern, but I've seen cases in the
>>>>> past where everyone's waiting for LGWR and LGWR doesn't seem to be bu=
sy and
>>>>> doesn't show any significant waits
>>>>>    - It turned out that LGWR was actually waiting for CKPT to make
>>>>> progress, but it wasn't synchronously waiting, but just going to slee=
p and
>>>>> "checking again later" in a loop
>>>>>    - I wrote about this here:
>>>>> https://blog.tanelpoder.com/posts/log-file-switch-checkpoint-incomple=
te-and-lgwr-waiting-for-checkpoint-progress/
>>>>>    - So it may still make sense to check what LGWR/LG* processes are
>>>>> doing even if they don't seem to be active much (like do you see an
>>>>> occasional controlfile read wait by LGWR etc)
>>>>>    - SQL*Trace would capture all activity of LGWR, but may slow thing=
s
>>>>> down further, so you could also sample & spool V$SESSION_WAIT_HISTORY=
 (that
>>>>> keeps a log of last 10 waits of every session in memory)
>>>>>
>>>>> *** i think i've addressed most of this.
>>>>
>>>>
>>>>> Incidentally I'm doing another hacking session tomorrow (Thursday 30
>>>>> Jan @ 12pm ET) about complex Oracle performance problems that span mu=
ltiple
>>>>> layers and have multiple moving parts. I can add some LGWR troublesho=
oting
>>>>> magic in the end of that (here goes my plan to fit the session in 1.5=
h
>>>>> again!). I'll post the details in my blog in a few minutes.
>>>>>
>>>>> *** can't wait!!
>>>>
>>>>> --
>>>>> Tanel Poder
>>>>> https://blog.tanelpoder.com/seminar
>>>>>
>>>>>
>>>>> On Tue, Jan 28, 2020 at 2:22 PM Chris Stephens <cstephens16@gmail.com=
>
>>>>> wrote:
>>>>>
>>>>>> doesn't look like any ash data for lgwr:
>>>>>>
>>>>>> SQL> @ashtop session_id,blocking_session,event2 1=3D1
>>>>>> "TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"
>>>>>>    TOTALSECONDS    AAS      %This    SESSION_ID    BLOCKING_SESSION
>>>>>>              EVENT2             FIRST_SEEN              LAST_SEEN
>>>>>>  DIST_SQLEXEC_SEEN
>>>>>> _______________ ______ __________ _____________ ___________________
>>>>>> _____________________ ______________________ ______________________
>>>>>> ____________________
>>>>>>              81    0.7   82% |             1713                1710
>>>>>> log file sync         2020-01-27 15:50:54    2020-01-27 15:52:14
>>>>>>             1
>>>>>>               5      0    5% |             1221
>>>>>> ON CPU                2020-01-27 15:50:59    2020-01-27 15:52:14
>>>>>>             1
>>>>>>               3      0    3% |              246
>>>>>> ON CPU                2020-01-27 15:51:00    2020-01-27 15:52:00
>>>>>>             1
>>>>>>               3      0    3% |             1713
>>>>>> ON CPU                2020-01-27 15:50:51    2020-01-27 15:52:27
>>>>>>             2
>>>>>>               1      0    1% |              123
>>>>>> ON CPU                2020-01-27 15:51:35    2020-01-27 15:51:35
>>>>>>             1
>>>>>>               1      0    1% |              127
>>>>>> ON CPU                2020-01-27 15:51:50    2020-01-27 15:51:50
>>>>>>             1
>>>>>>               1      0    1% |              252                2321
>>>>>> latch free            2020-01-27 15:52:28    2020-01-27 15:52:28
>>>>>>             1
>>>>>>               1      0    1% |              978
>>>>>> ges remote message    2020-01-27 15:50:48    2020-01-27 15:50:48
>>>>>>             1
>>>>>>               1      0    1% |              983
>>>>>> latch free            2020-01-27 15:52:28    2020-01-27 15:52:28
>>>>>>             1
>>>>>>               1      0    1% |             1713
>>>>>> library cache lock    2020-01-27 15:50:48    2020-01-27 15:50:48
>>>>>>             1
>>>>>>               1      0    1% |             1831
>>>>>> ON CPU                2020-01-27 15:50:49    2020-01-27 15:50:49
>>>>>>             1
>>>>>>
>>>>>>
>>>>>> 11 rows selected.
>>>>>>
>>>>>>
>>>>>> SQL> @ashtop session_id,blocking_session,event2 session_id=3D1710
>>>>>> "TIMESTAMP'2020-01-27 15:50:48'" "TIMESTAMP'2020-01-27 15:52:40'"
>>>>>>
>>>>>> no rows selected
>>>>>>
>>>>>> On Tue, Jan 28, 2020 at 1:09 PM Noveljic Nenad <
>>>>>> nenad.noveljic@vontobel.com> wrote:
>>>>>>
>>>>>>> You=E2=80=99re using v0.2, but  v0.6 should be the latest version. =
0.2
>>>>>>> wouldn=E2=80=99t show idle blockers.
>>>>>>>
>>>>>>> I looked at the ashtop output once again. Strange that lgwr
>>>>>>> processes nor any other bg processes show up significantly there. T=
he
>>>>>>> question is, what was lgwr doing while the fg processes were waitin=
g on him.
>>>>>>>
>>>>>>>

--0000000000001b00b1059d9f0f0f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Btw, adaptive log file sync is a differen=
t thing than the adaptive switch between LGWR-only processing vs. LGWR+LG% =
slave processing (that&#39;s why I mentioned this parameter: =C2=A0_use_sin=
gle_log_writer=3Dadaptive)<br></div><div dir=3D"ltr"><br></div></div><br><d=
iv class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sun, Feb =
2, 2020 at 10:20 AM Chris Stephens &lt;<a href=3D"mailto:cstephens16@gmail.=
com">cstephens16@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-le=
ft-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div di=
r=3D"ltr"><font face=3D"monospace">SQL&gt; @ashtopLFS username,event2,progr=
am2,blocking_session,blocking_inst_id &quot;program like &#39;%(LG%)&#39;&q=
uot; &quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &quot;TIMESTAMP&#39=
;2020-01-29 13:05:00&#39;&quot;<br>no rows selected<br>SQL&gt;<br></font><d=
iv><font face=3D"monospace"><br></font></div><div><font face=3D"monospace">=
i pulled over all the tables/views referenced in ashtop.sql and ash_wait_ch=
ains.sql to a single instance database. Hence, &quot;ashtopLFS.sql&quot;</f=
ont></div><div><font face=3D"monospace"><br></font></div><div><font face=3D=
"monospace">From the beginning we noticed slow writes to local disc (&gt;40=
ms at times). On Friday, as I was running out of ideas, the other DBA on th=
e project suggested we create a ramdisk for /tmp as we noticed lots of writ=
es to /tmp/.oracle. After doing that and rebooting all nodes, we are now un=
able to reproduce the issue.=C2=A0</font></div><div><font face=3D"monospace=
"><br></font></div><div><font face=3D"monospace">I&#39;d already turned off=
 adaptive log file syn:</font></div><div><font face=3D"monospace"><br></fon=
t></div><div><font face=3D"monospace">SQL&gt; @pd &quot;_use_adaptive&quot;=
<br>Show all parameters and session values from x$ksppi/x$ksppcv...<br><br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0NUM N_HEX NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 VALUE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0DESCRIPTION<br>---------- ----- ----------------=
---------------------------------------- ------------------------------ ---=
---------------------------------------------------------------------------=
---------------------<br>=C2=A0 =C2=A0 =C2=A0 2049 =C2=A0 801 _use_adaptive=
_log_file_sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0FALSE =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Adaptively =
switch between post/wait and polling<br><br>SQL&gt;<br></font></div><div><f=
ont face=3D"monospace"><br></font></div><div><font face=3D"monospace">On Mo=
nday (or Tuesday if I end up taking tomorrow off), we plan to remove the ra=
mdisk and see if the issue reproduces again.</font></div><div><font face=3D=
"monospace"><br></font></div><div><br></div></div><br><div class=3D"gmail_q=
uote"><div dir=3D"ltr" class=3D"gmail_attr">On Sat, Feb 1, 2020 at 9:34 PM =
Tanel Poder &lt;<a href=3D"mailto:tanel@tanelpoder.com" target=3D"_blank">t=
anel@tanelpoder.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-styl=
e:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"lt=
r"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr">Hi C=
hris,<div><br></div><div>As 12c+ has log writer slaves and Oracle can adapt=
avely switch=C2=A0from single LGWR to LGWR=C2=A0+ LGxx slave (_use_single_l=
og_writer=3Dadaptive), can you check ASH data for the LGxx slaves at this t=
ime too. Not all process-to-process interactions/waits are instrumented wit=
h a blocking_session in Oracle...</div><div><br></div><div>Should be this c=
ommand:</div><div><br></div><div><div><font face=3D"monospace">@ashtop user=
name,event2,program2,blocking_session,blocking_inst_id &quot;program like &=
#39;%(LG%)&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &q=
uot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;</font></div></div><div><br=
></div><div>--<br></div><div>Tanel Poder</div><div><a href=3D"https://blog.=
tanelpoder.com/seminar" target=3D"_blank">https://blog.tanelpoder.com/semin=
ar</a></div><div><br></div></div></div></div></div><br><div class=3D"gmail_=
quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jan 30, 2020 at 3:44 P=
M Chris Stephens &lt;<a href=3D"mailto:cstephens16@gmail.com" target=3D"_bl=
ank">cstephens16@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-le=
ft-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div di=
r=3D"ltr"><font face=3D"monospace">Here&#39;s some additional ash output fr=
om the trace data posted in last message including event_hist.sql:</font><d=
iv><font face=3D"monospace"><br></font></div><div><font face=3D"monospace">=
SQL&gt; @ashtop username,event2,program2,blocking_session,blocking_inst_id =
1=3D1 &quot;TIMESTAMP&#39;2020-01-29 13:04:20&#39;&quot; &quot;TIMESTAMP&#3=
9;2020-01-29 13:04:27&#39;&quot;<br>=C2=A0 =C2=A0TOTALSECONDS =C2=A0 =C2=A0=
AAS =C2=A0 =C2=A0 =C2=A0%This =C2=A0 =C2=A0 =C2=A0 =C2=A0 USERNAME =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 EVENT2 =C2=A0 =C2=A0 =C2=A0PROGRAM2 =C2=A0 =C2=
=A0BLOCKING_SESSION =C2=A0 =C2=A0BLOCKING_INST_ID =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 FIRST_SEEN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0LAST_SEEN =C2=A0 =C2=A0DIST_SQLEXEC_SEEN<br>_______________ ______ _____=
_____ ________________ ________________ _____________ ___________________ _=
__________________ ______________________ ______________________ __________=
__________<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2 =C2=A0 =C2=
=A00.3 =C2=A0 50% | =C2=A0 =C2=A0SYS =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (sqlplus) =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2020-01-=
29 13:04:24 =C2=A0 =C2=A02020-01-29 13:04:25 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A00.1 =C2=A0 25% | =C2=A0 =C2=A0GE=
N3_MGOWER_3 =C2=A0 =C2=A0ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (python)=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A02020-01-29 13:04:22 =C2=A0 =C2=A02020-01-29 13:04:22 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A00.1 =C2=A0 =
25% | =C2=A0 =C2=A0GEN3_MGOWER_3 =C2=A0 =C2=A0log file sync =C2=A0 =C2=A0(p=
ython) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 1710 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2 2=
020-01-29 13:04:20 =C2=A0 =C2=A02020-01-29 13:04:20 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br><br><br>SQL&gt=
; @bg lgwr<br>=C2=A0 =C2=A0NAME =C2=A0 =C2=A0DESCRIPTION =C2=A0 =C2=A0 SID =
=C2=A0 =C2=A0OPID =C2=A0 =C2=A0 SPID =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 PADDR =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SADDR<br>=
_______ ______________ _______ _______ ________ ___________________ _______=
____________<br>LGWR =C2=A0 =C2=A0Redo etc. =C2=A0 =C2=A0 =C2=A0 =C2=A0 171=
0 =C2=A0 =C2=A0 =C2=A034 26552 =C2=A0 =C2=A000000001E8718860 =C2=A0 =C2=A00=
0000001D8BFF4A0<br></font></div><div><font face=3D"monospace"><br></font></=
div><div><font face=3D"monospace">SQL&gt; @ashtop username,event2,program2,=
blocking_session,blocking_inst_id session_id=3D1710 &quot;TIMESTAMP&#39;202=
0-01-29 13:04:20&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:04:27&#39;&qu=
ot;<br><br>no rows selected<br><br>SQL&gt; @ashtop username,event2,program2=
,blocking_session,blocking_inst_id session_id=3D1710 &quot;TIMESTAMP&#39;20=
20-01-29 13:04:00&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&q=
uot;<br><br>no rows selected<br></font></div><div><font face=3D"monospace">=
<br></font></div><div><font face=3D"monospace">SQL&gt; @ash/event_hist &quo=
t;log file&quot; 1=3D1 =C2=A0&quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&q=
uot; &quot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0EVH_EVENT =C2=A0 =C2=A0 =C2=A0 EVH_MILLISEC =C2=A0 =C2=A0EVH_S=
AMPLE_COUNT =C2=A0 =C2=A0EVH_EST_EVENT_COUNT =C2=A0 =C2=A0EVH_EST_TOTAL_SEC=
 =C2=A0 =C2=A0PCT_EVT_TIME =C2=A0 =C2=A0 =C2=A0 EVH_GRAPH =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 FIRST_SEEN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0LAST_SEEN<br>________________ __________________ ________________=
___ ______________________ ____________________ _______________ ___________=
____ ______________________ ______________________<br>log file sync =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &lt; 1024 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01.8 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01.8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A012.8 |# =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A02020-01-29 13:=
04:13 =C2=A0 =C2=A02020-01-29 13:04:13<br>log file sync =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 &lt; 4096 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04.1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A029.1 |### =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A02020-01-29 13:04:12 =C2=A0=
 =C2=A02020-01-29 13:04:12<br>log file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 &lt; 8192 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A08.2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A058.2 |###### =
=C2=A0 =C2=A0| =C2=A0 =C2=A02020-01-29 13:04:20 =C2=A0 =C2=A02020-01-29 13:=
04:20<br></font></div><div><font face=3D"monospace"><br></font></div><div><=
font face=3D"monospace">SQL&gt; @ash/event_hist &quot;write&quot; 1=3D1 =C2=
=A0&quot;TIMESTAMP&#39;2020-01-29 13:04:00&#39;&quot; &quot;TIMESTAMP&#39;2=
020-01-29 13:05:00&#39;&quot;<br><br>no rows selected<br><br>SQL&gt; @ash/e=
vent_hist &quot;*&quot; 1=3D1 =C2=A0&quot;TIMESTAMP&#39;2020-01-29 13:04:00=
&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:05:00&#39;&quot;<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 EVH_EVENT =C2=A0 =C2=A0 =C2=A0 EVH_MILLISEC =C2=A0 =C2=
=A0EVH_SAMPLE_COUNT =C2=A0 =C2=A0EVH_EST_EVENT_COUNT =C2=A0 =C2=A0EVH_EST_T=
OTAL_SEC =C2=A0 =C2=A0PCT_EVT_TIME =C2=A0 =C2=A0 =C2=A0 EVH_GRAPH =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FIRST_SEEN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0LAST_SEEN<br>_________________ __________________ _____=
______________ ______________________ ____________________ _______________ =
_______________ ______________________ ______________________<br>log file s=
ync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt; 1024 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01.8 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01.8 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A012.8 |# =C2=A0 =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A0=
2020-01-29 13:04:13 =C2=A0 =C2=A02020-01-29 13:04:13<br>log file sync =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt; 4096 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04.1 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A029.1 |### =C2=A0 =C2=A0 =C2=A0 | =C2=A0 =C2=A02020-01-2=
9 13:04:12 =C2=A0 =C2=A02020-01-29 13:04:12<br>log file sync =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt; 8192 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A08.2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A058.2 |###### =C2=A0 =C2=A0| =C2=A0 =C2=A02020-01-29 13:04:20 =C2=
=A0 =C2=A02020-01-29 13:04:20<br>row cache lock =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt; 1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 2469.1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A02.5 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 100 |#######=
###| =C2=A0 =C2=A02020-01-29 13:04:01 =C2=A0 =C2=A02020-01-29 13:04:01<br><=
/font></div><div><font face=3D"monospace"><br></font></div><div><font face=
=3D"monospace">SQL&gt; @ashtop username,event2,program2,blocking_session,bl=
ocking_inst_id &quot;lower(program) like &#39;%lg%&#39;&quot; &quot;TIMESTA=
MP&#39;2020-01-29 13:04:00&#39;&quot; &quot;TIMESTAMP&#39;2020-01-29 13:05:=
00&#39;&quot;<br><br>no rows selected<br></font></div><div><font face=3D"mo=
nospace"><br></font></div><div><font face=3D"monospace">i&#39;m going to co=
py over gv$active_session_history to a single instance database and take a =
look at ash_wait_chains but i don&#39;t expect that to reveal anything eith=
er.</font></div><div><font face=3D"monospace"><br></font></div><div><font f=
ace=3D"monospace">i need to confirm whether adding redo resolved issue (wil=
l have developer fire up workload) but I think next step is to investigate =
x$trace and possible process communication issues?</font></div></div><br><d=
iv class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jan =
30, 2020 at 10:27 AM Chris Stephens &lt;<a href=3D"mailto:cstephens16@gmail=
.com" target=3D"_blank">cstephens16@gmail.com</a>&gt; wrote:<br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-=
width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);paddin=
g-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">Hey=C2=A0Tanel,<div><br></div=
><div>Thanks for taking time to write this up! Will definitely=C2=A0be atte=
nding that webinar.</div><div><br></div><div>comments inline</div></div><br=
><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, J=
an 29, 2020 at 10:32 PM Tanel Poder &lt;<a href=3D"mailto:tanel@tanelpoder.=
com" target=3D"_blank">tanel@tanelpoder.com</a>&gt; wrote:<br></div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-wi=
dth:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-=
left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"lt=
r"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div=
>Comments:</div><div><br></div><div>1) First, looks like you&#39;re using a=
ll the right tools for systematic troubleshooting ;-)</div></div></div></di=
v></div></div></div></div></div></blockquote><div><br></div><div>***i try!=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(20=
4,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"=
ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><d=
iv dir=3D"ltr"><div><br></div><div>2) As &quot;log file sync&quot; is not d=
irectly an I/O wait event (IO happens indirectly under the hood), but &quot=
;wait for LGWR to ACK/increment redo persisting position&quot; then there m=
ay be multiple reasons for delays. I&#39;m repeating some of what others ha=
ve said here too:</div><div>=C2=A0 - LGWR could be waiting for remote ACK (=
synchronous data guard) - that should show up as a LNS* wait or something l=
ike that though</div><div>=C2=A0 - Process slowness due to CPU overload/sch=
eduling latency=C2=A0or swapping (in both cases, LGWR may do its work fast,=
 but it takes a while for the foreground process to wake up and end the wai=
t event - you could check if there are any reports regarding switching LGWR=
 mode in alert.log around that time)</div><div>=C2=A0 - Bug/missed post due=
 to switching between LGWR progress polling vs. waiting for LGWR post</div>=
<div>=C2=A0 - ASM instance communication issues (ASM communication can be o=
n the critical path when creating/changing files on ASM and even for readin=
g, when the ASM metadata cache in your local instance doesn&#39;t have what=
 it needs)</div><div>=C2=A0 - So, the physical redo log file disk write is =
not necessarily the biggest contributor to the log file sync wait</div><div=
><br></div></div></div></div></div></div></div></div></div></blockquote><di=
v>*** no data guard in the env.</div><div>*** we have excess CPU. (20 cores=
 / instance and usually only 1-5 active foreground processes but i will con=
firm from TFA collections)</div><div>*** yesterday i was ready to blame bug=
. today, not so sure. i checked log file switches with a script google foun=
d me a few days ago. now thinking that was inaccurate as a script from meta=
stink shows tones of switching so that could very well be related.</div><di=
v>***asm communication is something i need to look into. TFA has reported s=
ome ASM issues but there&#39;s no details and i never saw anything in asm a=
lert logs indicating an issue.</div><div>=C2=A0</div><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-=
left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div =
dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"=
ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div></div><div>3) =
You&#39;ll need to measure what LGWR itself was doing during the problem ti=
me</div><div>=C2=A0 - Since you&#39;re on 19c (12c or later) the LGWR has s=
laves too and they may be doing most of the work=C2=A0</div><div>=C2=A0 - Y=
ou&#39;ll need to look into both the LGWR process and any other LG* ones to=
o</div><div>=C2=A0 - Since you have (Oracle) Snapper running already and wr=
iting to trace you should have the data already</div><div dir=3D"ltr">=C2=
=A0 - The <b>lgwr</b>=C2=A0in the following <font face=3D"monospace">@snapp=
er ash,stats,trace 10 999999 lgwr</font> syntax includes LGWR slaves too<br=
></div><div dir=3D"ltr">=C2=A0 - As LGWR slaves have their own sessions (an=
d SIDs), you should run ashtop etc and report if there&#39;s any=C2=A0BLOCK=
ING_SESSION value listed for LGWR and slaves<br></div><div>=C2=A0 - If LGWR=
 and the slaves were listed <i>idle</i> during the problem time (while othe=
rs waited for log file sync event at the same time) <i>and if=C2=A0</i>ther=
e was no huge OS level CPU/memory shortage at the time, this looks like a b=
ackground &lt;-&gt; foreground communication problem.=C2=A0</div><div>=C2=
=A0 =C2=A0 =C2=A0- This (and any potential ASM issues) could probably be tr=
oubleshooted via KST tracing, KSR channels and X$TRACE that could help here=
 (some of it is enabled by default ( Like I showed in a previous hacking se=
ssion: <a href=3D"https://www.youtube.com/watch?v=3DmkmvZv58W6w" target=3D"=
_blank">https://www.youtube.com/watch?v=3DmkmvZv58W6w</a> )</div><div><br><=
/div></div></div></div></div></div></div></div></div></blockquote><div><br>=
</div><div>here is all the output i have corresponding to a 6 sec &quot;log=
 file sync&quot; wait:</div><div><br></div><div><table style=3D"vertical-al=
ign:baseline;font-size:12px;line-height:13.5pt;font-family:Lato,verdana,&qu=
ot;bitstream vera sans&quot;,&quot;dejavu sans&quot;,&quot;liberation sans&=
quot;,geneva,sans-serif;color:rgb(0,0,0);margin:2ex 0px 2ex 40px;border-col=
lapse:collapse;border:0px"><tbody style=3D"vertical-align:baseline"><tr sty=
le=3D"vertical-align:baseline"><td style=3D"vertical-align:baseline;padding=
:0.5ex 0px 0.5ex 1em;text-align:right;font-size:10.2px;border:0px;white-spa=
ce:nowrap">Experience duration:</td><td style=3D"vertical-align:baseline;pa=
dding:0.5ex 1em;width:100em"><em style=3D"vertical-align:baseline">R</em>=
=C2=A0=3D=C2=A0<em style=3D"vertical-align:baseline">t</em><span style=3D"v=
ertical-align:0px;font-size:7.2px">1</span>=C2=A0=E2=88=92=C2=A0<em style=
=3D"vertical-align:baseline">t</em><span style=3D"vertical-align:0px;font-s=
ize:7.2px">0</span>=C2=A0=3D=C2=A0<span title=3D"This report is relevant on=
ly if the end-user experience you are analyzing has consumed the duration s=
hown here. If this duration does not correspond to your actual end-user per=
formance experience, then you should repair your trace data or collect trac=
e data more carefully for another experience. Time your data collection act=
ivation and deactivation to match precisely the begin and end times of the =
experience you are analyzing." style=3D"vertical-align:baseline;color:tomat=
o;font-weight:bold"><a href=3D"#m_7864118618086203961_m_-550597019906117488=
1_m_7542945560652427325_m_-5509691095923674063_profile-by-subroutine" style=
=3D"vertical-align:baseline;text-decoration-line:none;color:tomato">41.3361=
37=C2=A0seconds</a></span></td></tr><tr style=3D"vertical-align:baseline"><=
td style=3D"vertical-align:baseline;padding:0.5ex 0px 0.5ex 1em;text-align:=
right;font-size:10.2px;border:0px;white-space:nowrap">Experience begin time=
:</td><td style=3D"vertical-align:baseline;padding:0.5ex 1em;width:100em"><=
em style=3D"vertical-align:baseline">t</em><span style=3D"vertical-align:0p=
x;font-size:7.2px">0</span>=C2=A0=3D 2020-01-29T13:04:07.560478</td></tr><t=
r style=3D"vertical-align:baseline"><td style=3D"vertical-align:baseline;pa=
dding:0.5ex 0px 0.5ex 1em;text-align:right;font-size:10.2px;border:0px;whit=
e-space:nowrap">Experience end time:</td><td style=3D"vertical-align:baseli=
ne;padding:0.5ex 1em;width:100em"><em style=3D"vertical-align:baseline">t</=
em><span style=3D"vertical-align:0px;font-size:7.2px">1</span>=C2=A0=3D 202=
0-01-29T13:04:48.896615</td></tr></tbody></table></div>CALL-NAME =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0DU=
RATION =C2=A0 =C2=A0 =C2=A0 % =C2=A0 CALLS =C2=A0 =C2=A0 =C2=A0MEAN =C2=A0 =
=C2=A0 =C2=A0 MIN =C2=A0 =C2=A0 =C2=A0 MAX<br>-----------------------------=
- =C2=A0--------- =C2=A0------ =C2=A0------ =C2=A0-------- =C2=A0-------- =
=C2=A0--------<br>SQL*Net message from client =C2=A0 =C2=A0 26.017105 =C2=
=A0 66.5% =C2=A0 9,886 =C2=A00.002632 =C2=A00.000209 =C2=A07.858051<br>log =
file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 10=
.582409 =C2=A0 27.1% =C2=A0 =C2=A0 150 =C2=A00.070549 =C2=A00.000295 =C2=A0=
6.909711<br>EXEC =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2.088757 =C2=A0 =C2=A05.3% =C2=A0=
 9,702 =C2=A00.000215 =C2=A00.000000 =C2=A00.209199<br>DLM cross inst call =
completion =C2=A0 0.109617 =C2=A0 =C2=A00.3% =C2=A0 =C2=A0 212 =C2=A00.0005=
17 =C2=A00.000005 =C2=A00.080575<br>row cache lock =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0.070118 =C2=A0 =C2=A00.2% =C2=A0 =
=C2=A0 125 =C2=A00.000561 =C2=A00.000208 =C2=A00.000972<br>FETCH =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A00.047586 =C2=A0 =C2=A00.1% =C2=A0 1,029 =C2=A00.000046 =C2=A00=
.000000 =C2=A00.003056<br>PARSE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.045293 =C2=A0 =C2=
=A00.1% =C2=A0 4,778 =C2=A00.000009 =C2=A00.000000 =C2=A00.002692<br>CLOSE =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A00.033453 =C2=A0 =C2=A00.1% =C2=A0 4,784 =C2=A00.000=
007 =C2=A00.000000 =C2=A00.000237<br>library cache lock =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0.023614 =C2=A0 =C2=A00.1% =C2=A0 =C2=A0 =
=C2=A055 =C2=A00.000429 =C2=A00.000152 =C2=A00.000901<br>library cache pin =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.022311 =C2=A0 =C2=
=A00.1% =C2=A0 =C2=A0 =C2=A055 =C2=A00.000406 =C2=A00.000153 =C2=A00.000689=
<br>17 others =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A00.071430 =C2=A0 =C2=A00.2% =C2=A010,772 =C2=A00.000=
007 =C2=A00.000000 =C2=A00.001487<br>------------------------------ =C2=A0-=
-------- =C2=A0------ =C2=A0------ =C2=A0-------- =C2=A0-------- =C2=A0----=
----<br><div>TOTAL (27) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A039.111693 =C2=A0100.0% =C2=A041,548 =C2=A00.0009=
41 =C2=A00.000000 =C2=A07.858051</div><div><br></div><div>6 sec wait from r=
aw data:</div><div><br></div><div>*** 2020-01-29T13:04:20.724666-06:00<br>W=
AIT #0: nam=3D&#39;log file sync&#39; ela=3D 6909711 buffer#=3D58996 sync s=
cn=3D976089279 p3=3D0 obj#=3D-1 tim=3D1025228531958<br>WAIT #0: nam=3D&#39;=
SQL*Net message to client&#39; ela=3D 2 driver id=3D1413697536 #bytes=3D1 p=
3=3D0 obj#=3D-1 tim=3D1025228532118<br>WAIT #0: nam=3D&#39;SQL*Net message =
from client&#39; ela=3D 1711 driver id=3D1413697536 #bytes=3D1 p3=3D0 obj#=
=3D-1 tim=3D1025228533852<br>BINDS #140204587734864:<br></div><div><br></di=
v><div>i enabled 10046 on lgwr:</div><div><br></div><div>*** 2020-01-29T13:=
04:20.725994-06:00<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 291=
8443 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228533313<br>WAIT #0: =
nam=3D&#39;log file parallel write&#39; ela=3D 279 files=3D1 blocks=3D17 re=
quests=3D2 obj#=3D-1 tim=3D1025228533777<br>WAIT #0: nam=3D&#39;log file pa=
rallel write&#39; ela=3D 171 files=3D1 blocks=3D3 requests=3D2 obj#=3D-1 ti=
m=3D1025228534104<br>WAIT #0: nam=3D&#39;LGWR wait for redo copy&#39; ela=
=3D 22 copy latch #=3D28 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228534182<br>WAI=
T #0: nam=3D&#39;log file parallel write&#39; ela=3D 183 files=3D1 blocks=
=3D2 requests=3D2 obj#=3D-1 tim=3D1025228534437<br>WAIT #0: nam=3D&#39;log =
file parallel write&#39; ela=3D 174 files=3D1 blocks=3D2 requests=3D2 obj#=
=3D-1 tim=3D1025228534721<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=
=3D 31 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228534786<br>WAIT #0: =
nam=3D&#39;rdbms ipc message&#39; ela=3D 3233 timeout=3D8 p2=3D0 p3=3D0 obj=
#=3D-1 tim=3D1025228538038<br>WAIT #0: nam=3D&#39;log file parallel write&#=
39; ela=3D 170 files=3D1 blocks=3D2 requests=3D1 obj#=3D-1 tim=3D1025228538=
297<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 25 timeout=3D8 p2=
=3D0 p3=3D0 obj#=3D-1 tim=3D1025228538358<br>WAIT #0: nam=3D&#39;rdbms ipc =
message&#39; ela=3D 11829 timeout=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228=
550204<br>WAIT #0: nam=3D&#39;LGWR wait for redo copy&#39; ela=3D 52 copy l=
atch #=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228550294<br>WAIT #0: nam=3D&#=
39;log file parallel write&#39; ela=3D 239 files=3D1 blocks=3D7 requests=3D=
1 obj#=3D-1 tim=3D1025228550624<br>WAIT #0: nam=3D&#39;rdbms ipc message&#3=
9; ela=3D 23 timeout=3D6 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228550699<br>WAI=
T #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 16750 timeout=3D6 p2=3D0 p3=
=3D0 obj#=3D-1 tim=3D1025228567467<br>WAIT #0: nam=3D&#39;log file parallel=
 write&#39; ela=3D 207 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D10=
25228567762<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 26 timeout=
=3D5 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228567825<br>WAIT #0: nam=3D&#39;rdb=
ms ipc message&#39; ela=3D 16249 timeout=3D5 p2=3D0 p3=3D0 obj#=3D-1 tim=3D=
1025228584091<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 21=
3 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228584400<br>WAIT #=
0: nam=3D&#39;rdbms ipc message&#39; ela=3D 48 timeout=3D3 p2=3D0 p3=3D0 ob=
j#=3D-1 tim=3D1025228584502<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; e=
la=3D 16466 timeout=3D3 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228600993<br>WAIT=
 #0: nam=3D&#39;log file parallel write&#39; ela=3D 228 files=3D1 blocks=3D=
9 requests=3D1 obj#=3D-1 tim=3D1025228601393<br>WAIT #0: nam=3D&#39;rdbms i=
pc message&#39; ela=3D 29 timeout=3D1 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228=
601482<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 9548 timeout=3D=
1 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228611053<br>WAIT #0: nam=3D&#39;rdbms =
ipc message&#39; ela=3D 7467 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10=
25228618574<br>WAIT #0: nam=3D&#39;LGWR wait for redo copy&#39; ela=3D 13 c=
opy latch #=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228618636<br>WAIT #0: nam=
=3D&#39;log file parallel write&#39; ela=3D 222 files=3D1 blocks=3D9 reques=
ts=3D1 obj#=3D-1 tim=3D1025228618960<br>WAIT #0: nam=3D&#39;rdbms ipc messa=
ge&#39; ela=3D 31 timeout=3D300 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228619066=
<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 15946 timeout=3D300 p=
2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228635045<br>WAIT #0: nam=3D&#39;log file =
parallel write&#39; ela=3D 240 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 =
tim=3D1025228635427<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 33=
 timeout=3D298 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228635525<br>WAIT #0: nam=
=3D&#39;rdbms ipc message&#39; ela=3D 15987 timeout=3D298 p2=3D0 p3=3D0 obj=
#=3D-1 tim=3D1025228651537<br>WAIT #0: nam=3D&#39;log file parallel write&#=
39; ela=3D 255 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228651=
912<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 51 timeout=3D296 p=
2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228652034<br>WAIT #0: nam=3D&#39;rdbms ipc=
 message&#39; ela=3D 16332 timeout=3D296 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025=
228668405<br>WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 256 fi=
les=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025228668804<br>WAIT #0: n=
am=3D&#39;rdbms ipc message&#39; ela=3D 37 timeout=3D295 p2=3D0 p3=3D0 obj#=
=3D-1 tim=3D1025228668911<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=
=3D 16969 timeout=3D295 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228685907<br>WAIT=
 #0: nam=3D&#39;log file parallel write&#39; ela=3D 273 files=3D1 blocks=3D=
9 requests=3D1 obj#=3D-1 tim=3D1025228686327<br>WAIT #0: nam=3D&#39;rdbms i=
pc message&#39; ela=3D 38 timeout=3D293 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10252=
28686453<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 6908 timeout=
=3D293 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228693390<br>WAIT #0: nam=3D&#39;l=
og file parallel write&#39; ela=3D 247 files=3D1 blocks=3D2 requests=3D1 ob=
j#=3D-1 tim=3D1025228693768<br>WAIT #0: nam=3D&#39;rdbms ipc message&#39; e=
la=3D 38 timeout=3D292 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025228693862j<br></di=
v><div>=C2=A0...more of the same</div><div>264396 WAIT #0: nam=3D&#39;rdbms=
 ipc message&#39; ela=3D 32 timeout=3D52 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025=
231090742<br>=C2=A0264397 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D=
 19460 timeout=3D52 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231110227<br>=C2=A026=
4398 WAIT #0: nam=3D&#39;log file parallel write&#39; ela=3D 272 files=3D1 =
blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231110633<br>=C2=A0264399 WAIT =
#0: nam=3D&#39;rdbms ipc message&#39; ela=3D 37 timeout=3D50 p2=3D0 p3=3D0 =
obj#=3D-1 tim=3D1025231110748<br>=C2=A0264400 WAIT #0: nam=3D&#39;rdbms ipc=
 message&#39; ela=3D 15484 timeout=3D50 p2=3D0 p3=3D0 obj#=3D-1 tim=3D10252=
31126261<br>=C2=A0264401 WAIT #0: nam=3D&#39;log file parallel write&#39; e=
la=3D 247 files=3D1 blocks=3D4 requests=3D1 obj#=3D-1 tim=3D1025231126641<b=
r>=C2=A0264402 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 36 timeout=
=3D49 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231126753<br>=C2=A0264403 WAIT #0: =
nam=3D&#39;rdbms ipc message&#39; ela=3D 16732 timeout=3D49 p2=3D0 p3=3D0 o=
bj#=3D-1 tim=3D1025231143514<br>=C2=A0264404 WAIT #0: nam=3D&#39;LGWR wait =
for redo copy&#39; ela=3D 31 copy latch #=3D8 p2=3D0 p3=3D0 obj#=3D-1 tim=
=3D1025231143603<br>=C2=A0264405 WAIT #0: nam=3D&#39;log file parallel writ=
e&#39; ela=3D 279 files=3D1 blocks=3D9 requests=3D1 obj#=3D-1 tim=3D1025231=
143976<br>=C2=A0264406 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 35=
 timeout=3D47 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231144093<br>=C2=A0264407 W=
AIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 5384 timeout=3D47 p2=3D0 p=
3=3D0 obj#=3D-1 tim=3D1025231149515<br>=C2=A0264408 WAIT #0: nam=3D&#39;log=
 file parallel write&#39; ela=3D 255 files=3D1 blocks=3D4 requests=3D1 obj#=
=3D-1 tim=3D1025231149889<br>=C2=A0264409 WAIT #0: nam=3D&#39;rdbms ipc mes=
sage&#39; ela=3D 32 timeout=3D47 p2=3D0 p3=3D0 obj#=3D-1 tim=3D102523114998=
9<br>=C2=A0264410<br>=C2=A0264411 *** 2020-01-29T13:04:23.802774-06:00<br>=
=C2=A0264412 WAIT #0: nam=3D&#39;rdbms ipc message&#39; ela=3D 460056 timeo=
ut=3D46 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025231610086<br>=C2=A0264413<br>=C2=
=A0264414 *** 2020-01-29T13:04:26.797925-06:00<br>=C2=A0264415 WAIT #0: nam=
=3D&#39;rdbms ipc message&#39; ela=3D 2995027 timeout=3D300 p2=3D0 p3=3D0 o=
bj#=3D-1 tim=3D1025234605238<br>=C2=A0264416 WAIT #0: nam=3D&#39;rdbms ipc =
message&#39; ela=3D 10649 timeout=3D1 p2=3D0 p3=3D0 obj#=3D-1 tim=3D1025234=
616054<br>=C2=A0264417<br></div><div><br></div><div>i will get you snapper =
lgwr output associated w/ this interval as soon as i can find it (can&#39;t=
 remember where i stashed it and lost trace file name .. fml)</div><div><br=
></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204=
,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr">=
<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div di=
r=3D"ltr"><div></div><div>4) Paste the full <b>Snapper</b> output (of one t=
ime snapshot from the problem time) here if you can - we&#39;ll be able to =
see what the slaves were doing</div><div>=C2=A0 - You&#39;d need to include=
 not only LGWR SID but also any LG*=C2=A0slave SIDs too</div><div>=C2=A0 - =
Also we&#39;ll see better LGWR delay metrics from V$SESSTAT that go beyond =
just wait events</div><div>=C2=A0 - Thinking of metrics like these - snappe=
r already handles them (if you look into the rightmost column)</div><div><d=
iv>=C2=A0 =C2=A0 =C2=A0- <b>redo synch time overhead (usec)</b> - &quot;FG =
wakeup overhead per log file sync&quot;</div><div>=C2=A0 =C2=A0 =C2=A0- <b>=
redo write time</b> - accounting end to end redo write time (not just the l=
og file parallel write syscall) &quot;per redo write&quot;</div></div><div>=
<br></div><div>5) Regarding the <b>ASH wait chains</b> error on 19c - I&#39=
;m aware of this but forgot to look into it</div><div>=C2=A0 =C2=A0- It has=
 (probably) something to do with GV$ views (or PX slaves used for PDB V$ qu=
eries) and possibly the connect by I&#39;m using</div><div>=C2=A0 =C2=A0- D=
oes it work in the CDB (if multitenant)?</div><div>=C2=A0 =C2=A0- Does it w=
ork with lower optimizer features enabled?</div><div>=C2=A0 =C2=A0- As we a=
re already manually looking into what LGWR and its slaves are doing, there&=
#39;s no need for the wait chains script here</div><div><br></div></div></d=
iv></div></div></div></div></div></div></blockquote><div>***oracle has file=
d a bug. altering optimizer_features_enabled doesn&#39;t help.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(20=
4,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"=
ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><d=
iv dir=3D"ltr"><div></div><div>6) Linux Process Snapper - pSnapper</div><di=
v>=C2=A0 =C2=A0- If you want to see the full &quot;wait&quot; states of Lin=
ux processes/threads, use the<b> -a </b>option.</div><div>=C2=A0 =C2=A0- It=
&#39;s somewhat like always showing &quot;idle&quot; threads too</div><div>=
=C2=A0 =C2=A0- But for linux apps it&#39;s harder to determine what&#39;s i=
dle and what&#39;s some network RPC wait, so I currently just do what Linux=
 &quot;load&quot; metric does</div><div>=C2=A0 =C2=A0- I just shows threads=
 in Runnable and Disk/Uninterruptible sleep states by default (-a shows=C2=
=A0all states, including Sleep and kernel-Idle threads etc)</div><div>=C2=
=A0 =C2=A0- So the Linux pSnapper &quot;idle&quot; means a somewhat differe=
nt thing than Oracle &quot;idle&quot; - say LGWR was waiting for a Data Gua=
rd ACK, that should not be shown as Idle in Oracle, but at the Linux proces=
s level it&#39;s &quot;just some network socket wait&quot; and you&#39;d ne=
ed to use <b>-a</b>=C2=A0option to show all of them</div></div></div></div>=
</div></div></div></div></div></blockquote><div><br></div><div>*** will add=
 that option to future collections (and watch the video/read help)=C2=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204=
);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div=
 dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D=
"ltr"><div><br></div><div>7) Were LGWR/LG* processes mostly idle during the=
 problem time?</div><div>=C2=A0 =C2=A0- As I kept reading the thread, it lo=
oks like ASH reported LGWR being mostly idle during the problem time</div><=
div>=C2=A0 =C2=A0- This is not exactly the same pattern, but I&#39;ve seen =
cases in the past where everyone&#39;s waiting for LGWR and LGWR doesn&#39;=
t seem to be busy and doesn&#39;t show any significant waits</div><div>=C2=
=A0 =C2=A0- It turned out that LGWR was actually waiting for CKPT to make p=
rogress, but it wasn&#39;t synchronously waiting, but just going to sleep a=
nd &quot;checking again later&quot; in a loop</div><div>=C2=A0 =C2=A0- I wr=
ote about this here: <a href=3D"https://blog.tanelpoder.com/posts/log-file-=
switch-checkpoint-incomplete-and-lgwr-waiting-for-checkpoint-progress/" tar=
get=3D"_blank">https://blog.tanelpoder.com/posts/log-file-switch-checkpoint=
-incomplete-and-lgwr-waiting-for-checkpoint-progress/</a></div><div>=C2=A0 =
=C2=A0- So it may still make sense to check what LGWR/LG* processes are doi=
ng even if they don&#39;t seem to be active much (like do you see an occasi=
onal controlfile read wait by LGWR etc)</div><div>=C2=A0 =C2=A0- SQL*Trace =
would capture all activity of LGWR, but may slow things down further, so yo=
u could also sample &amp; spool V$SESSION_WAIT_HISTORY (that keeps a log of=
 last 10 waits of every session in memory)</div><div><br></div></div></div>=
</div></div></div></div></div></div></blockquote><div>*** i think i&#39;ve =
addressed most of this.</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-st=
yle:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"=
ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><d=
iv dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div></div><div>Incidental=
ly I&#39;m doing another hacking session tomorrow (Thursday 30 Jan @ 12pm E=
T) about complex Oracle performance problems that span multiple layers and =
have multiple moving parts. I can add some LGWR troubleshooting magic in th=
e end of that (here goes my plan to fit the session in 1.5h again!). I&#39;=
ll post the details in my blog in a few minutes.</div><div><br></div></div>=
</div></div></div></div></div></div></div></blockquote><div>*** can&#39;t w=
ait!!=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:r=
gb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div di=
r=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"lt=
r"><div dir=3D"ltr"><div></div><div>--</div><div>Tanel Poder</div><div><a h=
ref=3D"https://blog.tanelpoder.com/seminar" target=3D"_blank">https://blog.=
tanelpoder.com/seminar</a></div><div><br></div></div></div></div></div></di=
v></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr=
">On Tue, Jan 28, 2020 at 2:22 PM Chris Stephens &lt;<a href=3D"mailto:cste=
phens16@gmail.com" target=3D"_blank">cstephens16@gmail.com</a>&gt; wrote:<b=
r></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,20=
4,204);padding-left:1ex"><div dir=3D"ltr"><div><font face=3D"monospace">doe=
sn&#39;t look like any ash data for lgwr:</font></div><div><font face=3D"mo=
nospace"><br></font></div><font face=3D"monospace">SQL&gt; @ashtop session_=
id,blocking_session,event2 1=3D1 &quot;TIMESTAMP&#39;2020-01-27 15:50:48&#3=
9;&quot; &quot;TIMESTAMP&#39;2020-01-27 15:52:40&#39;&quot;<br>=C2=A0 =C2=
=A0TOTALSECONDS =C2=A0 =C2=A0AAS =C2=A0 =C2=A0 =C2=A0%This =C2=A0 =C2=A0SES=
SION_ID =C2=A0 =C2=A0BLOCKING_SESSION =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0EVENT2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FIRST_=
SEEN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0LAST_SEEN =C2=A0 =C2=
=A0DIST_SQLEXEC_SEEN<br>_______________ ______ __________ _____________ ___=
________________ _____________________ ______________________ _____________=
_________ ____________________<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A081 =C2=A0 =C2=A00.7 =C2=A0 82% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 1713 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01710 log=
 file sync =C2=A0 =C2=A0 =C2=A0 =C2=A0 2020-01-27 15:50:54 =C2=A0 =C2=A0202=
0-01-27 15:52:14 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
5 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A05% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 1221 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A020=
20-01-27 15:50:59 =C2=A0 =C2=A02020-01-27 15:52:14 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A03% | =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0246 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15:51:00 =C2=A0 =C2=A02020=
-01-27 15:52:00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 3 =
=C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A03% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 1713 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-0=
1-27 15:50:51 =C2=A0 =C2=A02020-01-27 15:52:27 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0123 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15:51:35 =C2=A0 =C2=A02020-01-=
27 15:51:35 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=
=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0127 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A020=
20-01-27 15:51:50 =C2=A0 =C2=A02020-01-27 15:51:50 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0252 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02321 latch free =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A02020-01-27 15:52:28 =C2=A0 =C2=A02020-01-27 15:52:28 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1=
<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A0=
0 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0978 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ges remo=
te message =C2=A0 =C2=A02020-01-27 15:50:48 =C2=A0 =C2=A02020-01-27 15:50:4=
8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =
=C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A09=
83 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 la=
tch free =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02020-01-27 15:52:28 =C2=
=A0 =C2=A02020-01-27 15:52:28 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 1713 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 library cache lock =C2=A0 =C2=A02020-01-27 15:50:48 =
=C2=A0 =C2=A02020-01-27 15:50:48 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A01% | =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 1831 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 ON CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A02020-01-27 15:50:49 =C2=A0 =C2=A02020-01-27 15:50:49 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1<br>=
<br><br>11 rows selected.<br><br><br>SQL&gt; @ashtop session_id,blocking_se=
ssion,event2 session_id=3D1710 &quot;TIMESTAMP&#39;2020-01-27 15:50:48&#39;=
&quot; &quot;TIMESTAMP&#39;2020-01-27 15:52:40&#39;&quot;<br><br>no rows se=
lected</font><br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" clas=
s=3D"gmail_attr">On Tue, Jan 28, 2020 at 1:09 PM Noveljic Nenad &lt;<a href=
=3D"mailto:nenad.noveljic@vontobel.com" target=3D"_blank">nenad.noveljic@vo=
ntobel.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;=
border-left-color:rgb(204,204,204);padding-left:1ex">




<div>
You=E2=80=99re using v0.2, but =C2=A0v0.6 should be the latest version. 0.2=
 wouldn=E2=80=99t show idle blockers.
<div><br>
</div>
<div>I looked at the ashtop output once again. Strange that lgwr processes =
nor any other bg processes show up significantly there. The question is, wh=
at was lgwr doing while the fg processes were waiting on him.<br>
<br></div>

</div>

</blockquote></div>
</blockquote></div></div></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div>

--0000000000001b00b1059d9f0f0f--
--
http://www.freelists.org/webpage/oracle-l


