Re: intermittent long "log file sync" waits
Date: Wed, 29 Jan 2020 10:51:12 -0600
Message-ID: <CAEFL0sw5yL7j=PQOYDLd5s7=8e87CCOznUtV_=i8-9Hzou=S7Q_at_mail.gmail.com>
15 rows selected.
Wed Jan 29 10:42:12 CST 2020
Linux Process Snapper v0.14 by Tanel Poder [https://tp.dev/psnapper] Sampling /proc/syscall, wchan, cmdline, stat for 5 seconds... finished.
- Active Threads
samples | avg_threads | cmdline | state | syscall | wchan
4 | 0.04 | ora_lgwr_lsst2db2 | Running (ON CPU) | semtimedop
| SYSC_semtimedop
samples: 100 (expected: 100)
total processes: 1, threads: 1
runtime: 5.00, measure time: 0.50
Wed Jan 29 10:42:17 CST 2020
Linux Process Snapper v0.14 by Tanel Poder [https://tp.dev/psnapper] Sampling /proc/syscall, wchan, cmdline, stat for 5 seconds... finished.
- Active Threads
samples | avg_threads | cmdline | state | syscall | wchan
1 | 0.01 | ora_lgwr_lsst2db2 | Running (ON CPU) | semtimedop
| SYSC_semtimedop
1 | 0.01 | ora_lgwr_lsst2db2 | Running (ON CPU) | semtimedop
| read_events
neither of the Log Writer Slave processes showed any activity via psnapper.
On Wed, Jan 29, 2020 at 10:05 AM Chris Stephens <cstephens16_at_gmail.com> wrote:
> ugh. its still occurring. i don't think it is related to log file switches: > > SQL> _at_log_switches > Date INST_ID Day Total h0 h1 h2 h3 > h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 > h15 h16 h17 h18 h19 h20 h21 h22 h23 Avg > ______________________ __________ ______ ________ _____ _____ _____ _____ > _____ _____ _____ _____ _____ _____ ______ ______ ______ ______ ______ > ______ ______ ______ ______ ______ ______ ______ ______ ______ _______ > 2020-01-29 00:00:00 1 Wed 10 1 1 3 1 > 1 1 1 1 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0.42 > 2020-01-29 00:00:00 2 Wed 13 2 1 3 1 > 2 2 1 1 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0.54 > 2020-01-29 00:00:00 3 Wed 14 2 1 4 1 > 2 1 1 2 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0.58 > > > > SQL> _at_ashtop event2 1=1 sysdate-5/60/24 sysdate > TOTALSECONDS AAS %This EVENT2 > FIRST_SEEN LAST_SEEN DIST_SQLEXEC_SEEN > _______________ ______ __________ _________________________________ > ______________________ ______________________ ____________________ > 123 0.4 52% | ON CPU > 2020-01-29 09:04:01 2020-01-29 09:08:51 48 > 96 0.3 41% | log file sync > 2020-01-29 09:06:04 2020-01-29 09:08:59 1 > 3 0 1% | DLM cross inst call completion > 2020-01-29 09:04:59 2020-01-29 09:07:23 3 > 3 0 1% | PX Deq: Slave Session Stats > 2020-01-29 09:05:30 2020-01-29 09:06:58 1 > 2 0 1% | ASM file metadata operation > 2020-01-29 09:05:18 2020-01-29 09:06:03 2 > 2 0 1% | library cache lock > 2020-01-29 09:04:54 2020-01-29 09:07:24 2 > 1 0 0% | Disk file operations I/O > 2020-01-29 09:05:09 2020-01-29 09:05:09 1 > 1 0 0% | KSV master wait > 2020-01-29 09:05:18 2020-01-29 09:05:18 1 > 1 0 0% | PX Deq: Join ACK > 2020-01-29 09:08:01 2020-01-29 09:08:01 1 > 1 0 0% | PX Deq: reap credit > 2020-01-29 09:06:58 2020-01-29 09:06:58 1 > 1 0 0% | control file parallel write > 2020-01-29 09:05:06 2020-01-29 09:05:06 1 > 1 0 0% | db file async I/O submit > 2020-01-29 09:05:37 2020-01-29 09:05:37 1 > 1 0 0% | enq: PS - contention [mode=6] > 2020-01-29 09:05:30 2020-01-29 09:05:30 1 > 1 0 0% | library cache pin > 2020-01-29 09:04:56 2020-01-29 09:04:56 1 > > > 14 rows selected. > > SQL> _at_ashtop event2,program 1=1 sysdate-5/60/24 sysdate > TOTALSECONDS AAS %This EVENT2 > PROGRAM FIRST_SEEN > LAST_SEEN DIST_SQLEXEC_SEEN > _______________ ______ __________ _________________________________ > ___________________________________________ ______________________ > ______________________ ____________________ > 85 0.3 34% | log file sync > python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:36 > 2020-01-29 09:09:24 1 > 56 0.2 22% | ON CPU > sqlplus_at_lsst-oradb05 (TNS V1-V3) 2020-01-29 09:05:15 > 2020-01-29 09:09:54 24 > 20 0.1 8% | log file sync > sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:04 > 2020-01-29 09:10:02 1 > 15 0.1 6% | ON CPU > python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:22 > 2020-01-29 09:09:21 13 > 11 0 4% | ON CPU > sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:01 > 2020-01-29 09:10:04 8 > 5 0 2% | ON CPU > oracle_at_lsst-oradb05 (DIA0) 2020-01-29 09:07:57 > 2020-01-29 09:10:02 1 > 5 0 2% | ON CPU > oracle_at_lsst-oradb06 (CJQ0) 2020-01-29 09:09:01 > 2020-01-29 09:09:05 1 > 5 0 2% | ON CPU > oracle_at_lsst-oradb06 (PPA7) 2020-01-29 09:06:46 > 2020-01-29 09:09:54 1 > 4 0 2% | ON CPU > oracle_at_lsst-oradb04 (CJQ0) 2020-01-29 09:05:05 > 2020-01-29 09:05:08 1 > 4 0 2% | ON CPU > oracle_at_lsst-oradb05 (LMON) 2020-01-29 09:06:37 > 2020-01-29 09:09:38 1 > 4 0 2% | ON CPU > oracle_at_lsst-oradb06 (LMON) 2020-01-29 09:05:06 > 2020-01-29 09:09:06 1 > 3 0 1% | PX Deq: Slave Session Stats > oracle_at_lsst-oradb05 (PPA7) 2020-01-29 09:05:30 > 2020-01-29 09:06:58 1 > 2 0 1% | DLM cross inst call completion > sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:02 > 2020-01-29 09:07:23 2 > 2 0 1% | ON CPU > oracle_at_lsst-oradb04 (PSP0) 2020-01-29 09:06:17 > 2020-01-29 09:06:18 1 > 2 0 1% | ON CPU > oracle_at_lsst-oradb05 (MMON) 2020-01-29 09:05:17 > 2020-01-29 09:05:47 2 > > > 15 rows selected. > > SQL> _at_ashtop session_id,session_serial#,event2,program 1=1 sysdate-5/60/24 > sysdate > TOTALSECONDS AAS %This SESSION_ID SESSION_SERIAL# > EVENT2 PROGRAM FIRST_SEEN > LAST_SEEN DIST_SQLEXEC_SEEN > _______________ ______ __________ _____________ __________________ > ________________ ___________________________________________ > ______________________ ______________________ ____________________ > 85 0.3 36% | 2078 23219 log > file sync python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 > 09:07:36 2020-01-29 09:09:24 1 > 56 0.2 24% | 2085 56305 ON CPU > sqlplus_at_lsst-oradb05 (TNS V1-V3) 2020-01-29 09:06:44 > 2020-01-29 09:11:34 25 > 6 0 3% | 2078 5162 log > file sync sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 > 09:09:57 2020-01-29 09:10:02 1 > 6 0 3% | 2078 24706 ON CPU > python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:10:12 > 2020-01-29 09:10:39 2 > 5 0 2% | 4 53890 ON CPU > oracle_at_lsst-oradb06 (CJQ0) 2020-01-29 09:09:01 > 2020-01-29 09:09:05 1 > 5 0 2% | 123 45272 ON CPU > oracle_at_lsst-oradb05 (DIA0) 2020-01-29 09:07:57 > 2020-01-29 09:10:02 1 > 5 0 2% | 2078 59858 ON CPU > python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:06:43 > 2020-01-29 09:06:52 5 > 4 0 2% | 246 4525 ON CPU > oracle_at_lsst-oradb05 (LMON) 2020-01-29 09:07:37 > 2020-01-29 09:11:38 1 > 4 0 2% | 2078 5162 ON CPU > sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:56 > 2020-01-29 09:10:05 3 > 4 0 2% | 2078 23219 ON CPU > python_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:09:10 > 2020-01-29 09:09:21 4 > 3 0 1% | 246 5232 ON CPU > oracle_at_lsst-oradb06 (LMON) 2020-01-29 09:09:06 > 2020-01-29 09:11:06 1 > 3 0 1% | 2078 20119 ON CPU > sqlplus_at_lsst-verify-worker28 (TNS V1-V3) 2020-01-29 09:07:25 > 2020-01-29 09:07:27 2 > 2 0 1% | 123 25437 ON CPU > oracle_at_lsst-oradb06 (DIA0) 2020-01-29 09:08:20 > 2020-01-29 09:11:22 1 > 2 0 1% | 368 63897 ON CPU > oracle_at_lsst-oradb06 (LMD0) 2020-01-29 09:07:29 > 2020-01-29 09:09:10 1 > 2 0 1% | 489 23602 ON CPU > oracle_at_lsst-oradb06 (PSP0) 2020-01-29 09:07:03 > 2020-01-29 09:09:08 1 > > > 15 rows selected. > > "_at_snapper ash,stats,trace 10 999999 lgwr" shows no active session > (v$session sampling) output during FIRST_SEEN - LAST_SEEN interval above. > > i'm going to start looping psnapper on all LG* processes and hope that > turns something up. > > still *very* open to any other suggestions. this is driving me crazy. > > On Wed, Jan 29, 2020 at 9:00 AM Noveljic Nenad < > nenad.noveljic_at_vontobel.com> wrote: > >> Yes, it was strange to see rman backups in all your reports. We should >> have paid more attention to it. But there were also periods when the backup >> was running without impacting log file sync. >> >> >> >> >possibly locks on CF? >> >> >> >> Locks on CF are typically held during log switches. You can correlate log >> switch times with the log file sync problem to answer this. I think >> Jonathan has already suggested that. >> >> >> >> >not sure why that wouldn't show up on ASH though. >> >> >> >> That’s a big question – I’ve never seen that before. pSnapper would >> provide the information what the lg* were doing. >> >> >> >> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On >> Behalf Of *Chris Stephens >> *Sent:* Mittwoch, 29. Januar 2020 15:49 >> *To:* Stefan Koehler <contact_at_soocs.de> >> *Cc:* oracle-l <Oracle-L_at_freelists.org> >> *Subject:* Re: intermittent long "log file sync" waits >> >> >> >> holy shnickees psnapper is awesome. i discovered quite a few backup >> related processes that were stuck w/ pstack prior to pulling down and using >> psnapper. after killing those sessions, the remaining rman processes showed >> progress w/ psnapper output like: >> >> >> >> psnapper]# ./psn -p 874 -g comm,state,syscall >> >> Linux Process Snapper v0.14 by Tanel Poder [https://tp.dev/psnapper] >> Sampling /proc/syscall, stat for 5 seconds... finished. >> >> >> === Active Threads >> =========================================================== >> >> samples | avg_threads | comm | state | >> syscall >> >> ------------------------------------------------------------------------------ >> 43 | 0.43 | (oracle_*_lsst) | Disk (Uninterruptible) | >> pread64 >> 31 | 0.31 | (oracle_*_lsst) | Disk (Uninterruptible) | lstat >> 7 | 0.07 | (oracle_*_lsst) | Running (ON CPU) | >> pread64 >> 6 | 0.06 | (oracle_*_lsst) | Running (ON CPU) | >> fstatfs >> 3 | 0.03 | (oracle_*_lsst) | Disk (Uninterruptible) | >> [running] >> 3 | 0.03 | (oracle_*_lsst) | Disk (Uninterruptible) | open >> 3 | 0.03 | (oracle_*_lsst) | Running (ON CPU) | >> [running] >> 1 | 0.01 | (oracle_*_lsst) | Disk (Uninterruptible) | >> fstatfs >> 1 | 0.01 | (oracle_*_lsst) | Running (ON CPU) | open >> >> >> >> it turns out we (i) copied over backup scripts and cron entries to >> another cluster i built up a few weeks ago. i thought i successfully >> replaced all references to database experiencing issues but that was not >> the case. I killed all processes related to backups scripts kicked off from >> this new RAC connecting to problem RAC and now ASH output is looking much >> better. >> >> >> >> i will continue to monitor for unreasonable "log file sync" waits but i >> suspect this was the cause of the issue. possibly locks on CF? not sure why >> that wouldn't show up on ASH though. >> >> >> >> On Wed, Jan 29, 2020 at 2:13 AM Stefan Koehler <contact_at_soocs.de> wrote: >> >> Hello Chris, >> as I already mentioned - please use pSnapper ( >> https://github.com/tanelpoder/psnapper ) by Tanel Poder. >> >> pSnapper samples /proc file system and does not suspend the process like >> pstack (as it is just a wrapper around gdb which uses ptrace() under the >> hood). Suspending a process can screw it up :) >> >> In addition pSnapper provides more useful information (e.g. state, C >> function that runs in kernel space for a specific syscall, etc.). >> >> Best Regards >> Stefan Koehler >> >> Independent Oracle performance consultant and researcher >> Website: http://www.soocs.de >> Twitter: _at_OracleSK >> >> > Chris Stephens <cstephens16_at_gmail.com> hat am 28. Januar 2020 um 23:49 >> geschrieben: >> > >> > was just getting ready to sign off and noticed the archivelog backup >> scheduled to run every hour seems to be stuck and has been for at least 10 >> mins: >> > >> > [oracle_at_lsst-oradb04 ~]$ ps -ef | grep oracle_backup.sh >> > oracle 13369 31167 0 16:43 pts/0 00:00:00 grep --color=auto >> oracle_backup.sh >> > oracle 14649 14645 0 16:00 ? 00:00:00 /bin/bash >> /home/oracle/scripts/rman/oracle_backup.sh -d lsst2db -s archivelog.rman -c >> lsst2db_rman -e cs2018_at_ncsa.illinois.edu -r iddsdba_rman >> > [oracle_at_lsst-oradb04 ~]$ pstack 15649 >> > Process 15649 not found. >> > [oracle_at_lsst-oradb04 ~]$ pstack 14649 >> > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 >> > #1 0x0000000000440b84 in waitchld.isra.10 () >> > #2 0x0000000000441e3c in wait_for () >> > #3 0x0000000000433b0e in execute_command_internal () >> > #4 0x0000000000433d2e in execute_command () >> > #5 0x000000000041e365 in reader_loop () >> > #6 0x000000000041c9ce in main () >> > [oracle_at_lsst-oradb04 ~]$ pstack 14649 >> > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 >> > #1 0x0000000000440b84 in waitchld.isra.10 () >> > #2 0x0000000000441e3c in wait_for () >> > #3 0x0000000000433b0e in execute_command_internal () >> > #4 0x0000000000433d2e in execute_command () >> > #5 0x000000000041e365 in reader_loop () >> > #6 0x000000000041c9ce in main () >> > [oracle_at_lsst-oradb04 ~]$ pstack 14649 >> > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 >> > #1 0x0000000000440b84 in waitchld.isra.10 () >> > #2 0x0000000000441e3c in wait_for () >> > #3 0x0000000000433b0e in execute_command_internal () >> > #4 0x0000000000433d2e in execute_command () >> > #5 0x000000000041e365 in reader_loop () >> > #6 0x000000000041c9ce in main () >> > [oracle_at_lsst-oradb04 ~]$ pstack 14649 >> > #0 0x00007f9a6537e41c in waitpid () from /lib64/libc.so.6 >> > #1 0x0000000000440b84 in waitchld.isra.10 () >> > #2 0x0000000000441e3c in wait_for () >> > #3 0x0000000000433b0e in execute_command_internal () >> > #4 0x0000000000433d2e in execute_command () >> > #5 0x000000000041e365 in reader_loop () >> > #6 0x000000000041c9ce in main () >> > [oracle_at_lsst-oradb04 ~]$ >> > >> > >> > On Tue, Jan 28, 2020 at 4:30 PM Chris Stephens < cstephens16_at_gmail.com> >> wrote: >> > > ok. i've had the following running since lunchtime and will let it >> run through the night in hopes that issue occurs again: >> > > >> > > _at_snapper ash,stats,trace 10 999999 lgwr >> > > >> > > I will run the following script as well: >> > > >> > > #! /bin/bash >> > > >> > > while [ 1 ] >> > > do >> > > echo `date` >> /tmp/pstack.26552 >> > > pstack 26552 >> /tmp/pstack.26552 >> > > echo `date` >> /tmp/pstack.26556 >> > > pstack 26556 >> /tmp/pstack.26556 >> > > echo `date` >> /tmp/pstack.26560 >> > > pstack 26560 >> /tmp/pstack.26560 >> > > sleep 5 >> > > done >> > > >> > > based off: >> > > >> > > [oracle_at_lsst-oradb05 bin]$ ps -ef | egrep "lg.*lsst2db2" >> > > oracle 26552 1 0 Jan17 ? 00:06:14 ora_lgwr_lsst2db2 >> > > oracle 26556 1 0 Jan17 ? 00:00:13 ora_lg00_lsst2db2 >> > > oracle 26560 1 0 Jan17 ? 00:00:12 ora_lg01_lsst2db2 >> > > >> > > All sessions for this workload are connecting to service w/ tracing >> enabled so we'll have trace data as well. >> > > >> > > I will (hopefully) have updates in the morning. >> >> ____________________________________________________ >> >> Please consider the environment before printing this e-mail. >> >> Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken. >> >> >> Important Notice >> This message is intended only for the individual named. It may contain >> confidential or privileged information. If you are not the named addressee >> you should in particular not disseminate, distribute, modify or copy this >> e-mail. Please notify the sender immediately by e-mail, if you have >> received this message by mistake and delete it from your system. >> Without prejudice to any contractual agreements between you and us which >> shall prevail in any case, we take it as your authorization to correspond >> with you by e-mail if you send us messages by e-mail. However, we reserve >> the right not to execute orders and instructions transmitted by e-mail at >> any time and without further explanation. >> E-mail transmission may not be secure or error-free as information could >> be intercepted, corrupted, lost, destroyed, arrive late or incomplete. Also >> processing of incoming e-mails cannot be guaranteed. All liability of >> Vontobel Holding Ltd. and any of its affiliates (hereinafter collectively >> referred to as "Vontobel Group") for any damages resulting from e-mail use >> is excluded. You are advised that urgent and time sensitive messages should >> not be sent by e-mail and if verification is required please request a >> printed version. Please note that all e-mail communications to and from the >> Vontobel Group are subject to electronic storage and review by Vontobel >> Group. Unless stated to the contrary and without prejudice to any >> contractual agreements between you and Vontobel Group which shall prevail >> in any case, e-mail-communication is for informational purposes only and is >> not intended as an offer or solicitation for the purchase or sale of any >> financial instrument or as an official confirmation of any transaction. >> The legal basis for the processing of your personal data is the >> legitimate interest to develop a commercial relationship with you, as well >> as your consent to forward you commercial communications. You can exercise, >> at any time and under the terms established under current regulation, your >> rights. If you prefer not to receive any further communications, please >> contact your client relationship manager if you are a client of Vontobel >> Group or notify the sender. Please note for an exact reference to the >> affected group entity the corporate e-mail signature. For further >> information about data privacy at Vontobel Group please consult >> www.vontobel.com. >> >
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Jan 29 2020 - 17:51:12 CET