RE: pr00 hanging in "parallel recovery control message reply"

From: Noveljic Nenad <nenad.noveljic_at_vontobel.com>
Date: Thu, 26 Nov 2020 16:49:15 +0000
Message-ID: <31079_1606409372_5FBFDC9C_31079_8468_1_186d082e17ad4d0097695b84f659f3c3_at_vontobel.com>



The problem was caused by a hanging nanosleep system call (an OS bug):

23976: ora_pr00_DB
00007ffed4b76eba nanosleep (7fffbfff8a90, 7fffbfff8aa0) 000000000afedcdc sltrusleep () + 6c
000000001899e10b kcrm_logmgr_idle_action () + 26b 000000001899d985 kcrm_wait_for_reply () + 55 0000000018ed883d krd_move_influx () + 44d 0000000018ed94f9 krdsoclean () + 599
000000001917708d krr_do_media_recovery () + 16bbd 0000000018eaa45b krddmr () + c7b
0000000018eebd1d krd_logmerger_driver () + 1a6d 000000001909f280 krp_slave_main () + bc0 000000000c12f49b ksvrdp_int () + 94b
000000000abbb283 opirip () + 263
0000000009e055eb opidrv () + 24b
0000000009e05307 sou2o () + 97
0000000009e050cf opimai_real () + 9f
0000000009e04c66 ssthrdmain () + 266
0000000009e049d7 main () + a7
000000000b4c237b ???????? ()

Best regards,

Nenad

From: Noveljic Nenad
Sent: Dienstag, 20. Oktober 2020 15:49
To: 'ORACLE-L' <oracle-l_at_freelists.org> Subject: RE: pr00 hanging in "parallel recovery control message reply"

I've managed to capture the last x$trace entries before the recovery hung.

select pname,function,file_loc,event,section,data from x$trace t, v$process p where p.pid = t.pid and p.pname like '%PR%' order by time ;

PNAME FUNCTION FILE_LOC EVENT SECTION DATA

----- ---------- ------------ ------ ---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
PR0O  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26219 seq_num=26220 snap_id=1
PR0O  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=297, exc=297, tot=297
PR0O  kslwtbctx  ksl2.c_at_3342<mailto:ksl2.c_at_3342>   10005            KSL WAIT BEG [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26220 seq_num=26221 snap_id=1
PR0O  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=54, exc=54, tot=54
PR0O  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26220 seq_num=26221 snap_id=1
PR21  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26402 seq_num=26403 snap_id=1
PR21  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=438, exc=438, tot=438
PR0X  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26501 seq_num=26502 snap_id=1
PR0X  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=464, exc=464, tot=464
PR07  kslwtbctx  ksl2.c_at_3342<mailto:ksl2.c_at_3342>   10005            KSL WAIT BEG [latch: messages] 1610659696/0x6000b770 53/0x35 0/0x0 wait_id=26874 seq_num=26875 snap_id=1
PR07  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch: messages] 1610659696/0x6000b770 53/0x35 0/0x0 wait_id=26874 seq_num=26875 snap_id=1
PR07  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=4, exc=4, tot=4
PR29  kslwtbctx  ksl2.c_at_3342<mailto:ksl2.c_at_3342>   10005            KSL WAIT BEG [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26605 seq_num=26606 snap_id=1
PR16  kslwtbctx  ksl2.c_at_3342<mailto:ksl2.c_at_3342>   10005            KSL WAIT BEG [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26767 seq_num=26768 snap_id=1
PR1Z  kslwtbctx  ksl2.c_at_3342<mailto:ksl2.c_at_3342>   10005            KSL WAIT BEG [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26818 seq_num=26819 snap_id=1
PR29  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26605 seq_num=26606 snap_id=1
PR29  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=148, exc=148, tot=148
PR16  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26767 seq_num=26768 snap_id=1
PR16  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=237, exc=237, tot=237
PR1Z  kslwtectx  ksl2.c_at_3493<mailto:ksl2.c_at_3493>   10005            KSL WAIT END wait times (usecs) - snap=280, exc=280, tot=280
PR1Z  kslwtectx  ksl2.c_at_3485<mailto:ksl2.c_at_3485>   10005            KSL WAIT END [latch free] 1610768712/0x60026148 325/0x145 0/0x0 wait_id=26818 seq_num=26819 snap_id=1
PR00  kccwbp     kcc.c_at_18381<mailto:kcc.c_at_18381>   10021            0115000000000000 0115000000000000 b06db40000000000

PR00 executed kccwbp (write to the control file).

This call has already finished when the recovery hung:

pstack 18229
18229: ora_pr00_DB
00007ffed487703a portfs (5, 4, 7fffbfff4840, 0, 5f5e100, 7fffbfff4860) 00000000061568bd sskgpwwait () + dd
00000000061564a3 skgpwwait () + c3
00000000063594ff ksliwat () + 9af
00000000063586ef kslwaitctx () + af
0000000015b6dd85 kcrm_wait_for_reply () + 125 0000000016368183 krr_do_media_recovery () + c183 00000000160935eb krddmr () + dab
00000000160dccd6 krd_logmerger_driver () + 7c6 00000000162a13cc krp_slave_main () + cac 00000000083b374d ksvrdp_int () + 81d
0000000006e84f47 opirip () + 257
00000000060a19d0 opidrv () + 160
00000000060a17e7 sou2o () + 97
00000000060a15b4 opimai_real () + a4
00000000060a110b ssthrdmain () + 26b
00000000060a0e79 main () + a9
00000000076f7784 ???????? ()

Three pr processes (PR1Z, PR16 and PR29) had contention on "active checkpoint queue latch".

SQL> select addr,name from v$latch where addr like '%60026148' or addr like '%6000B770' ;

ADDR NAME

---------------- ----------------------------------------------------------------
000000006000B770 messages
0000000060026148 active checkpoint queue latch

PR07 allocated the message latch. Maybe its message got lost?

I’ve observed this problem on 19.5 and 19.7, but not on other releases.

Best regards,

Nenad

http://nenadnoveljic.com/blog/



Please consider the environment before printing this e-mail. Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css">p { font-family: Arial;font-size:9pt }</style>
</head>
<body>
<p>
<br>Important Notice</br>
<br />

This message is intended only for the individual named. It may contain confidential or privileged information. If you are not the named addressee you should in particular not disseminate, distribute, modify or copy this e-mail. Please notify the sender immediately by e-mail, if you have received this message by mistake and delete it from your system.<br /> Without prejudice to any contractual agreements between you and us which shall prevail in any case, we take it as your authorization to correspond with you by e-mail if you send us messages by e-mail. However, we reserve the right not to execute orders and instructions transmitted by e-mail at any time and without further explanation.<br /> E-mail transmission may not be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete. Also processing of incoming e-mails cannot be guaranteed. All liability of Vontobel Holding Ltd. and any of its affiliates (hereinafter collectively referred to as "Vontobel Group") for any damages resulting from e-mail use is excluded. You are advised that urgent and time sensitive messages should not be sent by e-mail and if verification is required please request a printed version.</br> Please note that all e-mail communications to and from the Vontobel Group are subject to electronic storage and review by Vontobel Group. Unless stated to the contrary and without prejudice to any contractual agreements between you and Vontobel Group which shall prevail in any case, e-mail-communication is for informational purposes only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction.<br /> The legal basis for the processing of your personal data is the legitimate interest to develop a commercial relationship with you, as well as your consent to forward you commercial communications. You can exercise, at any time and under the terms established under current regulation, your rights. If you prefer not to receive any further communications, please contact your client relationship manager if you are a client of Vontobel Group or notify the sender. Please note for an exact reference to the affected group entity the corporate e-mail signature. For further information about data privacy at Vontobel Group please consult <a href="https://www.vontobel.com">www.vontobel.com</a>.<br />
</p>
</body>
</html>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 26 2020 - 17:49:15 CET

Original text of this message