Re: intermittent long "log file sync" waits
From: Chris Stephens <cstephens16_at_gmail.com>
Date: Wed, 29 Jan 2020 10:05:56 -0600
Message-ID: <CAEFL0syisjz+TgF5FC3nn0CBZ35jgawhBBz_=7oSjnYOZ3p1sA_at_mail.gmail.com>
Date: Wed, 29 Jan 2020 10:05:56 -0600
Message-ID: <CAEFL0syisjz+TgF5FC3nn0CBZ35jgawhBBz_=7oSjnYOZ3p1sA_at_mail.gmail.com>
ugh. its still occurring. i don't think it is related to log file switches:
SQL> _at_log_switches
Date INST_ID Day Total h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23 Avg ______________________ __________ ______ ________ _____ _____ _____ __________ _____ _____ _____ _____ _____ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ _______
2020-01-29 00:00:00 1 Wed 10 1 1 3 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.42 2020-01-29 00:00:00 2 Wed 13 2 1 3 1 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.54 2020-01-29 00:00:00 3 Wed 14 2 1 4 1 2 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.58 SQL> _at_ashtop event2 1=1 sysdate-5/60/24 sysdate TOTALSECONDS AAS %This EVENT2 FIRST_SEEN LAST_SEEN DIST_SQLEXEC_SEEN _______________ ______ __________ _________________________________
______________________ ______________________ ____________________
123 0.4 52% | ON CPU 2020-01-29 09:04:01 2020-01-29 09:08:51 48 96 0.3 41% | log file sync 2020-01-29 09:06:04 2020-01-29 09:08:59 1 3 0 1% | DLM cross inst call completion 2020-01-29 09:04:59 2020-01-29 09:07:23 3 3 0 1% | PX Deq: Slave Session Stats 2020-01-29 09:05:30 2020-01-29 09:06:58 1 2 0 1% | ASM file metadata operation 2020-01-29 09:05:18 2020-01-29 09:06:03 2 2 0 1% | library cache lock 2020-01-29 09:04:54 2020-01-29 09:07:24 2 1 0 0% | Disk file operations I/O 2020-01-29 09:05:09 2020-01-29 09:05:09 1 1 0 0% | KSV master wait 2020-01-29 09:05:18 2020-01-29 09:05:18 1 1 0 0% | PX Deq: Join ACK 2020-01-29 09:08:01 2020-01-29 09:08:01 1 1 0 0% | PX Deq: reap credit 2020-01-29 09:06:58 2020-01-29 09:06:58 1 1 0 0% | control file parallel write 2020-01-29 09:05:06 2020-01-29 09:05:06 1 1 0 0% | db file async I/O submit 2020-01-29 09:05:37 2020-01-29 09:05:37 1 1 0 0% | enq: PS - contention [mode=6] 2020-01-29 09:05:30 2020-01-29 09:05:30 1 1 0 0% | library cache pin 2020-01-29 09:04:56 2020-01-29 09:04:56 1
14 rows selected.
SQL> _at_ashtop event2,program 1=1 sysdate-5/60/24 sysdate
TOTALSECONDS AAS %This EVENT2 PROGRAM FIRST_SEEN LAST_SEEN DIST_SQLEXEC_SEEN _______________ ______ __________ _________________________________ ___________________________________________ ______________________ ______________________ ____________________ 85 0.3 34% | log file sync python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:36 2020-01-29 09:09:24 1 56 0.2 22% | ON CPU sqlplus_at_lsst-oradb05 (TNS V1-V3) 2020-01-29 09:05:15 2020-01-29 09:09:54 24 20 0.1 8% | log file sync sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:04 2020-01-29 09:10:02 1 15 0.1 6% | ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:22 2020-01-29 09:09:21 13 11 0 4% | ON CPU sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:01 2020-01-29 09:10:04 8 5 0 2% | ON CPU oracle_at_lsst-oradb05 (DIA0) 2020-01-29 09:07:57 2020-01-29 09:10:02 1 5 0 2% | ON CPU oracle_at_lsst-oradb06 (CJQ0) 2020-01-29 09:09:01 2020-01-29 09:09:05 1 5 0 2% | ON CPU oracle_at_lsst-oradb06 (PPA7) 2020-01-29 09:06:46 2020-01-29 09:09:54 1 4 0 2% | ON CPU oracle_at_lsst-oradb04 (CJQ0) 2020-01-29 09:05:05 2020-01-29 09:05:08 1 4 0 2% | ON CPU oracle_at_lsst-oradb05 (LMON) 2020-01-29 09:06:37 2020-01-29 09:09:38 1 4 0 2% | ON CPU oracle_at_lsst-oradb06 (LMON) 2020-01-29 09:05:06 2020-01-29 09:09:06 1 3 0 1% | PX Deq: Slave Session Stats oracle_at_lsst-oradb05 (PPA7) 2020-01-29 09:05:30 2020-01-29 09:06:58 1 2 0 1% | DLM cross inst call completion sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:02 2020-01-29 09:07:23 2 2 0 1% | ON CPU oracle_at_lsst-oradb04 (PSP0) 2020-01-29 09:06:17 2020-01-29 09:06:18 1 2 0 1% | ON CPU oracle_at_lsst-oradb05 (MMON) 2020-01-29 09:05:17 2020-01-29 09:05:47 2
15 rows selected.
SQL> _at_ashtop session_id,session_serial#,event2,program 1=1 sysdate-5/60/24 sysdate
TOTALSECONDS AAS %This SESSION_ID SESSION_SERIAL# EVENT2 PROGRAM FIRST_SEEN LAST_SEEN DIST_SQLEXEC_SEEN _______________ ______ __________ _____________ __________________ ________________ ___________________________________________
______________________ ______________________ ____________________
85 0.3 36% | 2078 23219 log file sync python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:36 2020-01-29 09:09:24 1 56 0.2 24% | 2085 56305 ON CPU sqlplus_at_lsst-oradb05 (TNS V1-V3) 2020-01-29 09:06:44 2020-01-29 09:11:34 25 6 0 3% | 2078 5162 log file sync sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:57 2020-01-29 09:10:02 1 6 0 3% | 2078 24706 ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:10:12 2020-01-29 09:10:39 2 5 0 2% | 4 53890 ON CPU oracle_at_lsst-oradb06 (CJQ0) 2020-01-29 09:09:01 2020-01-29 09:09:05 1 5 0 2% | 123 45272 ON CPU oracle_at_lsst-oradb05 (DIA0) 2020-01-29 09:07:57 2020-01-29 09:10:02 1 5 0 2% | 2078 59858 ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:43 2020-01-29 09:06:52 5 4 0 2% | 246 4525 ON CPU oracle_at_lsst-oradb05 (LMON) 2020-01-29 09:07:37 2020-01-29 09:11:38 1 4 0 2% | 2078 5162 ON CPU sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:56 2020-01-29 09:10:05 3 4 0 2% | 2078 23219 ON CPU python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:10 2020-01-29 09:09:21 4 3 0 1% | 246 5232 ON CPU oracle_at_lsst-oradb06 (LMON) 2020-01-29 09:09:06 2020-01-29 09:11:06 1 3 0 1% | 2078 20119 ON CPU sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:25 2020-01-29 09:07:27 2 2 0 1% | 123 25437 ON CPU oracle_at_lsst-oradb06 (DIA0) 2020-01-29 09:08:20 2020-01-29 09:11:22 1 2 0 1% | 368 63897 ON CPU oracle_at_lsst-oradb06 (LMD0) 2020-01-29 09:07:29 2020-01-29 09:09:10 1 2 0 1% | 489 23602 ON CPU oracle_at_lsst-oradb06 (PSP0) 2020-01-29 09:07:03 2020-01-29 09:09:08 1
15 rows selected.
"_at_snapper ash,stats,trace 10 999999 lgwr" shows no active session (v$session sampling) output during FIRST_SEEN - LAST_SEEN interval above.
i'm going to start looping psnapper on all LG* processes and hope that turns something up.
still *very* open to any other suggestions. this is driving me crazy.
On Wed, Jan 29, 2020 at 9:00 AM Noveljic Nenad <nenad.noveljic_at_vontobel.com> wrote:
> Yes, it was strange to see rman backups in all your reports. We should > have paid more attention to it. But there were also periods when the backup > was running without impacting log file sync. > > > > >possibly locks on CF? > > > > Locks on CF are typically held during log switches. You can correlate log > switch times with the log file sync problem to answer this. I think > Jonathan has already suggested that. > > > > >not sure why that wouldn't show up on ASH though. > > > > That’s a big question – I’ve never seen that before. pSnapper would > provide the information what the lg* were doing. > > > > *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On > Behalf Of *Chris Stephens > *Sent:* Mittwoch, 29. Januar 2020 15:49 > *To:* Stefan Koehler <contact_at_soocs.de> > *Cc:* oracle-l <Oracle-L_at_freelists.org> > *Subject:* Re: intermittent long "log file sync" waits > > > > holy shnickees psnapper is awesome. i discovered quite a few backup > related processes that were stuck w/ pstack prior to pulling down and using > psnapper. after killing those sessions, the remaining rman processes showed > progress w/ psnapper output like: > > > > psnapper]# ./psn -p 874 -g comm,state,syscall > > Linux Process Snapper v0.14 by Tanel Poder [https://tp.dev/psnapper] > Sampling /proc/syscall, stat for 5 seconds... finished. > > > === Active Threads > =========================================================== > > samples | avg_threads | comm | state | syscall > > ------------------------------------------------------------------------------ > 43 | 0.43 | (oracle_*_lsst) | Disk (Uninterruptible) | pread64 > 31 | 0.31 | (oracle_*_lsst) | Disk (Uninterruptible) | lstat > 7 | 0.07 | (oracle_*_lsst) | Running (ON CPU) | pread64 > 6 | 0.06 | (oracle_*_lsst) | Running (ON CPU) | fstatfs > 3 | 0.03 | (oracle_*_lsst) | Disk (Uninterruptible) | > [running] > 3 | 0.03 | (oracle_*_lsst) | Disk (Uninterruptible) | open > 3 | 0.03 | (oracle_*_lsst) | Running (ON CPU) | > [running] > 1 | 0.01 | (oracle_*_lsst) | Disk (Uninterruptible) | fstatfs > 1 | 0.01 | (oracle_*_lsst) | Running (ON CPU) | open > > > > it turns out we (i) copied over backup scripts and cron entries to another > cluster i built up a few weeks ago. i thought i successfully replaced all > references to database experiencing issues but that was not the case. I > killed all processes related to backups scripts kicked off from this new > RAC connecting to problem RAC and now ASH output is looking much better. > > > > i will continue to monitor for unreasonable "log file sync" waits but i > suspect this was the cause of the issue. possibly locks on CF? not sure why > that wouldn't show up on ASH though. > > > > On Wed, Jan 29, 2020 at 2:13 AM Stefan Koehler <contact_at_soocs.de> wrote: > > Hello Chris, > as I already mentioned - please use pSnapper ( > https://github.com/tanelpoder/psnapper ) by Tanel Poder. > > pSnapper samples /proc file system and does not suspend the process like > pstack (as it is just a wrapper around gdb which uses ptrace() under the > hood). Suspending a process can screw it up :) > > In addition pSnapper provides more useful information (e.g. state, C > function that runs in kernel space for a specific syscall, etc.). > > Best Regards > Stefan Koehler > > Independent Oracle performance consultant and researcher > Website: http://www.soocs.de > Twitter: _at_OracleSK > > > Chris Stephens <cstephens16_at_gmail.com> hat am 28. Januar 2020 um 23:49 > geschrieben: > > > > was just getting ready to sign off and noticed the archivelog backup > scheduled to run every hour seems to be stuck and has been for at least 10 > mins: > > > > [oracle_at_lsst-oradb04 ~]$ ps -ef | grep oracle_backup.sh > > oracle 13369 31167 0 16:43 pts/0 00:00:00 grep --color=auto > oracle_backup.sh > > oracle 14649 14645 0 16:00 ? 00:00:00 /bin/bash > /home/oracle/scripts/rman/oracle_backup.sh -d lsst2db -s archivelog.rman -c > lsst2db_rman -e cs2018_at_ncsa.illinois.edu -r iddsdba_rman > > [oracle_at_lsst-oradb04 ~]$ pstack 15649 > > Process 15649 not found. > > [oracle_at_lsst-oradb04 ~]$ pstack 14649 > > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 > > #1 0x0000000000440b84 in waitchld.isra.10 () > > #2 0x0000000000441e3c in wait_for () > > #3 0x0000000000433b0e in execute_command_internal () > > #4 0x0000000000433d2e in execute_command () > > #5 0x000000000041e365 in reader_loop () > > #6 0x000000000041c9ce in main () > > [oracle_at_lsst-oradb04 ~]$ pstack 14649 > > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 > > #1 0x0000000000440b84 in waitchld.isra.10 () > > #2 0x0000000000441e3c in wait_for () > > #3 0x0000000000433b0e in execute_command_internal () > > #4 0x0000000000433d2e in execute_command () > > #5 0x000000000041e365 in reader_loop () > > #6 0x000000000041c9ce in main () > > [oracle_at_lsst-oradb04 ~]$ pstack 14649 > > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 > > #1 0x0000000000440b84 in waitchld.isra.10 () > > #2 0x0000000000441e3c in wait_for () > > #3 0x0000000000433b0e in execute_command_internal () > > #4 0x0000000000433d2e in execute_command () > > #5 0x000000000041e365 in reader_loop () > > #6 0x000000000041c9ce in main () > > [oracle_at_lsst-oradb04 ~]$ pstack 14649 > > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 > > #1 0x0000000000440b84 in waitchld.isra.10 () > > #2 0x0000000000441e3c in wait_for () > > #3 0x0000000000433b0e in execute_command_internal () > > #4 0x0000000000433d2e in execute_command () > > #5 0x000000000041e365 in reader_loop () > > #6 0x000000000041c9ce in main () > > [oracle_at_lsst-oradb04 ~]$ > > > > > > On Tue, Jan 28, 2020 at 4:30 PM Chris Stephens < cstephens16_at_gmail.com> > wrote: > > > ok. i've had the following running since lunchtime and will let it run > through the night in hopes that issue occurs again: > > > > > > _at_snapper ash,stats,trace 10 999999 lgwr > > > > > > I will run the following script as well: > > > > > > #! /bin/bash > > > > > > while [ 1 ] > > > do > > > echo `date` >> /tmp/pstack.26552 > > > pstack 26552 >> /tmp/pstack.26552 > > > echo `date` >> /tmp/pstack.26556 > > > pstack 26556 >> /tmp/pstack.26556 > > > echo `date` >> /tmp/pstack.26560 > > > pstack 26560 >> /tmp/pstack.26560 > > > sleep 5 > > > done > > > > > > based off: > > > > > > [oracle_at_lsst-oradb05 bin]$ ps -ef | egrep "lg.*lsst2db2" > > > oracle 26552 1 0 Jan17 ? 00:06:14 ora_lgwr_lsst2db2 > > > oracle 26556 1 0 Jan17 ? 00:00:13 ora_lg00_lsst2db2 > > > oracle 26560 1 0 Jan17 ? 00:00:12 ora_lg01_lsst2db2 > > > > > > All sessions for this workload are connecting to service w/ tracing > enabled so we'll have trace data as well. > > > > > > I will (hopefully) have updates in the morning. > > ____________________________________________________ > > Please consider the environment before printing this e-mail. > > Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. > > > Important Notice > This message is intended only for the individual named. It may contain > confidential or privileged information. If you are not the named addressee > you should in particular not disseminate, distribute, modify or copy this > e-mail. Please notify the sender immediately by e-mail, if you have > received this message by mistake and delete it from your system. > Without prejudice to any contractual agreements between you and us which > shall prevail in any case, we take it as your authorization to correspond > with you by e-mail if you send us messages by e-mail. However, we reserve > the right not to execute orders and instructions transmitted by e-mail at > any time and without further explanation. > E-mail transmission may not be secure or error-free as information could > be intercepted, corrupted, lost, destroyed, arrive late or incomplete. Also > processing of incoming e-mails cannot be guaranteed. All liability of > Vontobel Holding Ltd. and any of its affiliates (hereinafter collectively > referred to as "Vontobel Group") for any damages resulting from e-mail use > is excluded. You are advised that urgent and time sensitive messages should > not be sent by e-mail and if verification is required please request a > printed version. Please note that all e-mail communications to and from the > Vontobel Group are subject to electronic storage and review by Vontobel > Group. Unless stated to the contrary and without prejudice to any > contractual agreements between you and Vontobel Group which shall prevail > in any case, e-mail-communication is for informational purposes only and is > not intended as an offer or solicitation for the purchase or sale of any > financial instrument or as an official confirmation of any transaction. > The legal basis for the processing of your personal data is the legitimate > interest to develop a commercial relationship with you, as well as your > consent to forward you commercial communications. You can exercise, at any > time and under the terms established under current regulation, your rights. > If you prefer not to receive any further communications, please contact > your client relationship manager if you are a client of Vontobel Group or > notify the sender. Please note for an exact reference to the affected group > entity the corporate e-mail signature. For further information about data > privacy at Vontobel Group please consult www.vontobel.com. >
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Jan 29 2020 - 17:05:56 CET