RE: DMON killing RSM0?

From: Herring, Dave <"Herring,>
Date: Sat, 13 Jun 2020 16:26:36 +0000
Message-ID: <CH2PR02MB6664FBEEB9111055E0815E64D49E0_at_CH2PR02MB6664.namprd02.prod.outlook.com>



Jonathan, that's actually a pretty good point. The database team (the whole team was outsourced a year ago) raised a ticket with Oracle about this per my request and up until this point it I felt the investigation was sub-par (one request was to restart the dbs). Last night Oracle came back and said DMON is killing off RSM0 processes because of an internal timeout and suggested increasing "OperationTimeout". I plan on trying to track down someone from the network team to see if there's an exceptional load on that portion of the network every night. The primary and standby dbs are in separate data centers, about 60 miles apart. It's odd that these timeouts are new after moving from X-4 to X-8.

This list is definitely my sanity check and as always, I appreciate it's existence and all those willing to take time and share their valuable experience and thoughts!

Regards,

Dave

From: Jonathan Lewis <jlewisoracle_at_gmail.com> Sent: Saturday, June 13, 2020 3:24 AM
To: Herring, Dave <HerringD_at_DNB.com> Cc: oracle-l_at_freelists.org
Subject: Re: DMON killing RSM0?

CAUTION: This email originated from outside of D&B. Please do not click links or open attachments unless you recognize the sender and know the content is safe.

Offered from a position of complete ignorance of any interaction between DMON & RSM0.

Since ORA-16665 is a timeout error and the kill is happening on ALL the databases, is it possible that something floods your network to such an extent that DMON starts getting timeouts when talking to RSM0 and interprets this as RSM0 becoming unstable (I think there are some bugs about memory leaks and subsequence performance problems on MOS) and responds by requesting for it to be killed and restarted.

Regards
Jonathan Lewis

On Fri, Jun 12, 2020 at 10:40 PM Herring, Dave <dmarc-noreply_at_freelists.org<mailto:dmarc-noreply_at_freelists.org>> wrote: I have a situation where it looks like the DMON process is killing off RSM0 processes every night around the same time and I don't have a good explanation as to why. This is on a 4-node Exadata env running 18c with 6 dbs, all using DG (the standby is also a 4-node Exadata env).

Every night between 20:12 and 21:35 we get a series of ORA-16665 errors from all databases, errors found in the broker's logfile. Checking each db's alert log I see messages like the following:

Process RSM0, PID = 51310, will be killed Process termination requested for pid 51310 [source = rdbms], [info = 2] [request issued by pid: 76161, uid: 110]

SPID 76161 is DMON, which means every night DMON kills off RSM0 processes around the same time. This is done for all databases.

Is there a DG broker setting that says to wipe out all DGB resource processes and restart them?

Regards,

Dave

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Jun 13 2020 - 18:26:36 CEST

Original text of this message