Re: intermittent long "log file sync" waits

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Wed, 29 Jan 2020 10:05:56 -0600
Message-ID: <CAEFL0syisjz+TgF5FC3nn0CBZ35jgawhBBz_=7oSjnYOZ3p1sA_at_mail.gmail.com>



ugh. its still occurring. i don't think it is related to log file switches:

SQL> _at_log_switches

                  Date    INST_ID    Day    Total    h0    h1    h2    h3
 h4    h5    h6    h7    h8    h9    h10    h11    h12    h13    h14    h15
   h16    h17    h18    h19    h20    h21    h22    h23     Avg
______________________ __________ ______ ________ _____ _____ _____ _____
_____ _____ _____ _____ _____ _____ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ _______
2020-01-29 00:00:00             1 Wed          10     1     1     3     1
  1     1     1     1     0     0      0      0      0      0      0      0
     0      0      0      0      0      0      0      0    0.42
2020-01-29 00:00:00             2 Wed          13     2     1     3     1
  2     2     1     1     0     0      0      0      0      0      0      0
     0      0      0      0      0      0      0      0    0.54
2020-01-29 00:00:00             3 Wed          14     2     1     4     1
  2     1     1     2     0     0      0      0      0      0      0      0
     0      0      0      0      0      0      0      0    0.58



SQL> _at_ashtop event2 1=1 sysdate-5/60/24 sysdate
   TOTALSECONDS    AAS      %This                            EVENT2
    FIRST_SEEN              LAST_SEEN    DIST_SQLEXEC_SEEN
_______________ ______ __________ _________________________________

______________________ ______________________ ____________________
123 0.4 52% | ON CPU 2020-01-29 09:04:01 2020-01-29 09:08:51 48 96 0.3 41% | log file sync 2020-01-29 09:06:04 2020-01-29 09:08:59 1 3 0 1% | DLM cross inst call completion 2020-01-29 09:04:59 2020-01-29 09:07:23 3 3 0 1% | PX Deq: Slave Session Stats 2020-01-29 09:05:30 2020-01-29 09:06:58 1 2 0 1% | ASM file metadata operation 2020-01-29 09:05:18 2020-01-29 09:06:03 2 2 0 1% | library cache lock 2020-01-29 09:04:54 2020-01-29 09:07:24 2 1 0 0% | Disk file operations I/O 2020-01-29 09:05:09 2020-01-29 09:05:09 1 1 0 0% | KSV master wait 2020-01-29 09:05:18 2020-01-29 09:05:18 1 1 0 0% | PX Deq: Join ACK 2020-01-29 09:08:01 2020-01-29 09:08:01 1 1 0 0% | PX Deq: reap credit 2020-01-29 09:06:58 2020-01-29 09:06:58 1 1 0 0% | control file parallel write 2020-01-29 09:05:06 2020-01-29 09:05:06 1 1 0 0% | db file async I/O submit 2020-01-29 09:05:37 2020-01-29 09:05:37 1 1 0 0% | enq: PS - contention [mode=6] 2020-01-29 09:05:30 2020-01-29 09:05:30 1 1 0 0% | library cache pin 2020-01-29 09:04:56 2020-01-29 09:04:56 1

14 rows selected.

SQL> _at_ashtop event2,program 1=1 sysdate-5/60/24 sysdate

   TOTALSECONDS    AAS      %This                            EVENT2
                            PROGRAM             FIRST_SEEN
 LAST_SEEN    DIST_SQLEXEC_SEEN
_______________ ______ __________ _________________________________
___________________________________________ ______________________
______________________ ____________________
             85    0.3   34% |    log file sync
python_at_lsst-verify-worker28 (TNS V1-V3)     2020-01-29 09:07:36
 2020-01-29 09:09:24                       1
             56    0.2   22% |    ON CPU
 sqlplus_at_lsst-oradb05 (TNS V1-V3)            2020-01-29 09:05:15
 2020-01-29 09:09:54                      24
             20    0.1    8% |    log file sync
sqlplus_at_lsst-verify-worker28 (TNS V1-V3)    2020-01-29 09:06:04
 2020-01-29 09:10:02                       1
             15    0.1    6% |    ON CPU
 python_at_lsst-verify-worker28 (TNS V1-V3)     2020-01-29 09:06:22
 2020-01-29 09:09:21                      13
             11      0    4% |    ON CPU
 sqlplus_at_lsst-verify-worker28 (TNS V1-V3)    2020-01-29 09:06:01
 2020-01-29 09:10:04                       8
              5      0    2% |    ON CPU
 oracle_at_lsst-oradb05 (DIA0)                  2020-01-29 09:07:57
 2020-01-29 09:10:02                       1
              5      0    2% |    ON CPU
 oracle_at_lsst-oradb06 (CJQ0)                  2020-01-29 09:09:01
 2020-01-29 09:09:05                       1
              5      0    2% |    ON CPU
 oracle_at_lsst-oradb06 (PPA7)                  2020-01-29 09:06:46
 2020-01-29 09:09:54                       1
              4      0    2% |    ON CPU
 oracle_at_lsst-oradb04 (CJQ0)                  2020-01-29 09:05:05
 2020-01-29 09:05:08                       1
              4      0    2% |    ON CPU
 oracle_at_lsst-oradb05 (LMON)                  2020-01-29 09:06:37
 2020-01-29 09:09:38                       1
              4      0    2% |    ON CPU
 oracle_at_lsst-oradb06 (LMON)                  2020-01-29 09:05:06
 2020-01-29 09:09:06                       1
              3      0    1% |    PX Deq: Slave Session Stats
oracle_at_lsst-oradb05 (PPA7)                  2020-01-29 09:05:30
 2020-01-29 09:06:58                       1
              2      0    1% |    DLM cross inst call completion
 sqlplus_at_lsst-verify-worker28 (TNS V1-V3)    2020-01-29 09:06:02
 2020-01-29 09:07:23                       2
              2      0    1% |    ON CPU
 oracle_at_lsst-oradb04 (PSP0)                  2020-01-29 09:06:17
 2020-01-29 09:06:18                       1
              2      0    1% |    ON CPU
 oracle_at_lsst-oradb05 (MMON)                  2020-01-29 09:05:17
 2020-01-29 09:05:47                       2


15 rows selected.

SQL> _at_ashtop session_id,session_serial#,event2,program 1=1 sysdate-5/60/24 sysdate

   TOTALSECONDS    AAS      %This    SESSION_ID    SESSION_SERIAL#
  EVENT2                                     PROGRAM             FIRST_SEEN
             LAST_SEEN    DIST_SQLEXEC_SEEN
_______________ ______ __________ _____________ __________________
________________ ___________________________________________

______________________ ______________________ ____________________
85 0.3 36% | 2078 23219 log file sync python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:36 2020-01-29 09:09:24 1 56 0.2 24% | 2085 56305 ON CPU sqlplus_at_lsst-oradb05 (TNS V1-V3) 2020-01-29 09:06:44 2020-01-29 09:11:34 25 6 0 3% | 2078 5162 log file sync sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:57 2020-01-29 09:10:02 1 6 0 3% | 2078 24706 ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:10:12 2020-01-29 09:10:39 2 5 0 2% | 4 53890 ON CPU oracle_at_lsst-oradb06 (CJQ0) 2020-01-29 09:09:01 2020-01-29 09:09:05 1 5 0 2% | 123 45272 ON CPU oracle_at_lsst-oradb05 (DIA0) 2020-01-29 09:07:57 2020-01-29 09:10:02 1 5 0 2% | 2078 59858 ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:43 2020-01-29 09:06:52 5 4 0 2% | 246 4525 ON CPU oracle_at_lsst-oradb05 (LMON) 2020-01-29 09:07:37 2020-01-29 09:11:38 1 4 0 2% | 2078 5162 ON CPU sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:56 2020-01-29 09:10:05 3 4 0 2% | 2078 23219 ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:10 2020-01-29 09:09:21 4 3 0 1% | 246 5232 ON CPU oracle_at_lsst-oradb06 (LMON) 2020-01-29 09:09:06 2020-01-29 09:11:06 1 3 0 1% | 2078 20119 ON CPU sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:25 2020-01-29 09:07:27 2 2 0 1% | 123 25437 ON CPU oracle_at_lsst-oradb06 (DIA0) 2020-01-29 09:08:20 2020-01-29 09:11:22 1 2 0 1% | 368 63897 ON CPU oracle_at_lsst-oradb06 (LMD0) 2020-01-29 09:07:29 2020-01-29 09:09:10 1 2 0 1% | 489 23602 ON CPU oracle_at_lsst-oradb06 (PSP0) 2020-01-29 09:07:03 2020-01-29 09:09:08 1

15 rows selected.

"_at_snapper ash,stats,trace 10 999999 lgwr" shows no active session (v$session sampling) output during FIRST_SEEN - LAST_SEEN interval above.

i'm going to start looping psnapper on all LG* processes and hope that turns something up.

still *very* open to any other suggestions. this is driving me crazy.

On Wed, Jan 29, 2020 at 9:00 AM Noveljic Nenad <nenad.noveljic_at_vontobel.com> wrote:

> Yes, it was strange to see rman backups in all your reports. We should
> have paid more attention to it. But there were also periods when the backup
> was running without impacting log file sync.
>
>
>
> >possibly locks on CF?
>
>
>
> Locks on CF are typically held during log switches. You can correlate log
> switch times with the log file sync problem to answer this. I think
> Jonathan has already suggested that.
>
>
>
> >not sure why that wouldn't show up on ASH though.
>
>
>
> That’s a big question – I’ve never seen that before. pSnapper would
> provide the information what the lg* were doing.
>
>
>
> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
> Behalf Of *Chris Stephens
> *Sent:* Mittwoch, 29. Januar 2020 15:49
> *To:* Stefan Koehler <contact_at_soocs.de>
> *Cc:* oracle-l <Oracle-L_at_freelists.org>
> *Subject:* Re: intermittent long "log file sync" waits
>
>
>
> holy shnickees psnapper is awesome. i discovered quite a few backup
> related processes that were stuck w/ pstack prior to pulling down and using
> psnapper. after killing those sessions, the remaining rman processes showed
> progress w/ psnapper output like:
>
>
>
> psnapper]# ./psn -p 874 -g comm,state,syscall
>
> Linux Process Snapper v0.14 by Tanel Poder [https://tp.dev/psnapper]
> Sampling /proc/syscall, stat for 5 seconds... finished.
>
>
> === Active Threads
> ===========================================================
>
>  samples | avg_threads | comm            | state                  | syscall
>
> ------------------------------------------------------------------------------
>       43 |        0.43 | (oracle_*_lsst) | Disk (Uninterruptible) | pread64
>       31 |        0.31 | (oracle_*_lsst) | Disk (Uninterruptible) | lstat
>        7 |        0.07 | (oracle_*_lsst) | Running (ON CPU)       | pread64
>        6 |        0.06 | (oracle_*_lsst) | Running (ON CPU)       | fstatfs
>        3 |        0.03 | (oracle_*_lsst) | Disk (Uninterruptible) |
> [running]
>        3 |        0.03 | (oracle_*_lsst) | Disk (Uninterruptible) | open
>        3 |        0.03 | (oracle_*_lsst) | Running (ON CPU)       |
> [running]
>        1 |        0.01 | (oracle_*_lsst) | Disk (Uninterruptible) | fstatfs
>        1 |        0.01 | (oracle_*_lsst) | Running (ON CPU)       | open
>
>
>
> it turns out we (i) copied over backup scripts and cron entries to another
> cluster i built up a few weeks ago. i thought i successfully replaced all
> references to database experiencing issues but that was not the case. I
> killed all processes related to backups scripts kicked off from this new
> RAC connecting to problem RAC and now ASH output is looking much better.
>
>
>
> i will continue to monitor for unreasonable "log file sync" waits but i
> suspect this was the cause of the issue. possibly locks on CF? not sure why
> that wouldn't show up on ASH though.
>
>
>
> On Wed, Jan 29, 2020 at 2:13 AM Stefan Koehler <contact_at_soocs.de> wrote:
>
> Hello Chris,
> as I already mentioned - please use pSnapper (
> https://github.com/tanelpoder/psnapper ) by Tanel Poder.
>
> pSnapper samples /proc file system and does not suspend the process like
> pstack (as it is just a wrapper around gdb which uses ptrace() under the
> hood). Suspending a process can screw it up :)
>
> In addition pSnapper provides more useful information (e.g. state, C
> function that runs in kernel space for a specific syscall, etc.).
>
> Best Regards
> Stefan Koehler
>
> Independent Oracle performance consultant and researcher
> Website: http://www.soocs.de
> Twitter: _at_OracleSK
>
> > Chris Stephens <cstephens16_at_gmail.com> hat am 28. Januar 2020 um 23:49
> geschrieben:
> >
> > was just getting ready to sign off and noticed the archivelog backup
> scheduled to run every hour seems to be stuck and has been for at least 10
> mins:
> >
> > [oracle_at_lsst-oradb04 ~]$ ps -ef | grep oracle_backup.sh
> > oracle   13369 31167  0 16:43 pts/0    00:00:00 grep --color=auto
> oracle_backup.sh
> > oracle   14649 14645  0 16:00 ?        00:00:00 /bin/bash
> /home/oracle/scripts/rman/oracle_backup.sh -d lsst2db -s archivelog.rman -c
> lsst2db_rman -e cs2018_at_ncsa.illinois.edu -r iddsdba_rman
> > [oracle_at_lsst-oradb04 ~]$ pstack 15649
> > Process 15649 not found.
> > [oracle_at_lsst-oradb04 ~]$ pstack 14649
> > #0  0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> > #1  0x0000000000440b84 in waitchld.isra.10 ()
> > #2  0x0000000000441e3c in wait_for ()
> > #3  0x0000000000433b0e in execute_command_internal ()
> > #4  0x0000000000433d2e in execute_command ()
> > #5  0x000000000041e365 in reader_loop ()
> > #6  0x000000000041c9ce in main ()
> > [oracle_at_lsst-oradb04 ~]$ pstack 14649
> > #0  0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> > #1  0x0000000000440b84 in waitchld.isra.10 ()
> > #2  0x0000000000441e3c in wait_for ()
> > #3  0x0000000000433b0e in execute_command_internal ()
> > #4  0x0000000000433d2e in execute_command ()
> > #5  0x000000000041e365 in reader_loop ()
> > #6  0x000000000041c9ce in main ()
> > [oracle_at_lsst-oradb04 ~]$ pstack 14649
> > #0  0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> > #1  0x0000000000440b84 in waitchld.isra.10 ()
> > #2  0x0000000000441e3c in wait_for ()
> > #3  0x0000000000433b0e in execute_command_internal ()
> > #4  0x0000000000433d2e in execute_command ()
> > #5  0x000000000041e365 in reader_loop ()
> > #6  0x000000000041c9ce in main ()
> > [oracle_at_lsst-oradb04 ~]$ pstack 14649
> > #0  0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6
> > #1  0x0000000000440b84 in waitchld.isra.10 ()
> > #2  0x0000000000441e3c in wait_for ()
> > #3  0x0000000000433b0e in execute_command_internal ()
> > #4  0x0000000000433d2e in execute_command ()
> > #5  0x000000000041e365 in reader_loop ()
> > #6  0x000000000041c9ce in main ()
> > [oracle_at_lsst-oradb04 ~]$
> >
> >
> > On Tue, Jan 28, 2020 at 4:30 PM Chris Stephens < cstephens16_at_gmail.com>
> wrote:
> > > ok. i've had the following running since lunchtime and will let it run
> through the night in hopes that issue occurs again:
> > >
> > > _at_snapper ash,stats,trace 10 999999 lgwr
> > >
> > > I will run the following script as well:
> > >
> > > #! /bin/bash
> > >
> > > while [ 1 ]
> > > do
> > >     echo `date` >> /tmp/pstack.26552
> > >     pstack 26552 >> /tmp/pstack.26552
> > >     echo `date` >> /tmp/pstack.26556
> > >     pstack 26556 >> /tmp/pstack.26556
> > >     echo `date` >> /tmp/pstack.26560
> > >     pstack 26560 >> /tmp/pstack.26560
> > >     sleep 5
> > > done
> > >
> > > based off:
> > >
> > > [oracle_at_lsst-oradb05 bin]$ ps -ef | egrep "lg.*lsst2db2"
> > > oracle   26552     1  0 Jan17 ?        00:06:14 ora_lgwr_lsst2db2
> > > oracle   26556     1  0 Jan17 ?        00:00:13 ora_lg00_lsst2db2
> > > oracle   26560     1  0 Jan17 ?        00:00:12 ora_lg01_lsst2db2
> > >
> > > All sessions for this workload are connecting to service w/ tracing
> enabled so we'll have trace data as well.
> > >
> > > I will (hopefully) have updates in the morning.
>
> ____________________________________________________
>
> Please consider the environment before printing this e-mail.
>
> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken.
>
>
> Important Notice
> This message is intended only for the individual named. It may contain
> confidential or privileged information. If you are not the named addressee
> you should in particular not disseminate, distribute, modify or copy this
> e-mail. Please notify the sender immediately by e-mail, if you have
> received this message by mistake and delete it from your system.
> Without prejudice to any contractual agreements between you and us which
> shall prevail in any case, we take it as your authorization to correspond
> with you by e-mail if you send us messages by e-mail. However, we reserve
> the right not to execute orders and instructions transmitted by e-mail at
> any time and without further explanation.
> E-mail transmission may not be secure or error-free as information could
> be intercepted, corrupted, lost, destroyed, arrive late or incomplete. Also
> processing of incoming e-mails cannot be guaranteed. All liability of
> Vontobel Holding Ltd. and any of its affiliates (hereinafter collectively
> referred to as "Vontobel Group") for any damages resulting from e-mail use
> is excluded. You are advised that urgent and time sensitive messages should
> not be sent by e-mail and if verification is required please request a
> printed version. Please note that all e-mail communications to and from the
> Vontobel Group are subject to electronic storage and review by Vontobel
> Group. Unless stated to the contrary and without prejudice to any
> contractual agreements between you and Vontobel Group which shall prevail
> in any case, e-mail-communication is for informational purposes only and is
> not intended as an offer or solicitation for the purchase or sale of any
> financial instrument or as an official confirmation of any transaction.
> The legal basis for the processing of your personal data is the legitimate
> interest to develop a commercial relationship with you, as well as your
> consent to forward you commercial communications. You can exercise, at any
> time and under the terms established under current regulation, your rights.
> If you prefer not to receive any further communications, please contact
> your client relationship manager if you are a client of Vontobel Group or
> notify the sender. Please note for an exact reference to the affected group
> entity the corporate e-mail signature. For further information about data
> privacy at Vontobel Group please consult www.vontobel.com.
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 29 2020 - 17:05:56 CET

Original text of this message