Oracle 12c agent troubleshooting (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)

From: Martin Bach <development_at_the-playground.de>
Date: Thu, 27 Oct 2011 15:59:34 +0100
Message-ID: <4EA971D6.8030308_at_the-playground.de>



Good afternoon!

It's been a busy day on the mailing list, and maybe I can benefit from this a little :) Before I begin I have to admit that I'm not the best agent troubleshooter, and 12.1 hasn't made that easier.

I have 2 agents that are deployed on a 2 node cluster, both have worked in the past. After a reboot, both stopped to function. Now I have this:

[oracle_at_rac11203node1 log]$ emctl status agent Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0 Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.



Agent Version : 12.1.0.1.0
OMS Version : (unknown)
Protocol Version : 12.1.0.1.0
Agent Home : /u01/app/oracle/product/agent_inst Agent Binaries : /u01/app/oracle/product/core/12.1.0.1.0 Agent Process ID : 13270
Parent Process ID : 13215
Agent URL : https://rac11203node1.localdomain:3872/emd/main/ Repository URL : https://oem12oms.localdomain:4901/empbs/upload Started at : 2011-10-26 18:30:17
Started by user : oracle
Last Reload : (none)
Last successful upload : (none)
Last attempted upload : (none)
Total Megabytes of XML files uploaded so far : 0 Number of XML files pending upload : 1,858 Size of XML files pending upload(MB) : 8.05 Available disk space on upload filesystem : 49.16% Collection Status : Collections enabled
Last attempted heartbeat to OMS : 2011-10-27 15:42:47 Last successful heartbeat to OMS : (none)

Agent is Running and Ready

The settings are correct, I have verified that with another, uploading and otherwise fine agent.

I have also secured the agent, and
$AGENT_BASE/agent_inst/sysman/log/secure.log as well as the emctl secure agent commands reported normal, successful operation.

Still the stubborn thing doesn't want to talk to the OMS - in the agent overview page both agents are listed as "unavailable", but not blocked. When I force an upload, I get this:

[oracle_at_rac11203node1 log]$ emctl upload Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0 Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.



EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS version not checked yet. If this issue persists check trace files for ping to OMS related errors. (OMS_DOWN)

However it's not down, I can reach it from another agent (which happens to be on the same box as the OMS)

[oracle_at_oem12oms 12.1.0.1.0]$ $ORACLE_HOME/bin/emctl status agent Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0 Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.



Agent Version : 12.1.0.1.0
OMS Version : 12.1.0.1.0
Protocol Version : 12.1.0.1.0
Agent Home : /u01/gc12.1/agent/agent_inst Agent Binaries : /u01/gc12.1/agent/core/12.1.0.1.0 Agent Process ID : 2964
Parent Process ID : 2910
Agent URL : https://oem12oms.localdomain:3872/emd/main/ Repository URL : https://oem12oms.localdomain:4901/empbs/upload Started at : 2011-10-15 21:00:37
Started by user : oracle
Last Reload : (none)
Last successful upload : 2011-10-27 15:46:38 Last attempted upload : 2011-10-27 15:46:38 Total Megabytes of XML files uploaded so far : 137.79 Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0 Available disk space on upload filesystem : 50.78% Collection Status : Collections enabled
Last attempted heartbeat to OMS : 2011-10-27 15:48:34 Last successful heartbeat to OMS : 2011-10-27 15:48:34

Agent is Running and Ready

And no, the firewall is turned off and I can connect to the upload from any machine in the network:

[oracle_at_rac11203node1 log]$ wget --no-check-certificate https://oem12oms.localdomain:4901/empbs/upload
--2011-10-27 15:55:46-- https://oem12oms.localdomain:4901/empbs/upload
Resolving oem12oms.localdomain... 192.168.99.28 Connecting to oem12oms.localdomain|192.168.99.28|:4901... connected. WARNING: cannot verify oem12oms.localdomain’s certificate, issued by “/O=EnterpriseManager on oem12oms.localdomain/OU=EnterpriseManager on oem12oms.localdomain/L=EnterpriseManager on oem12oms.localdomain/ST=CA/C=US/CN=oem12oms.localdomain”: Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK Length: 314 [text/html]
Saving to: “upload.1”

100%[======================================>] 314 --.-K/s in 0s

2011-10-27 15:55:46 (5.19 MB/s) - “upload.1” saved [314/314]

The agent complains about this in gcagent.log:

2011-10-27 15:56:08,947 [37:3F09CD9C] WARN - improper ping interval (EM_PING_NOTIF_RESPONSE: BACKOFF::180000) 2011-10-27 15:56:18,471 [167:E3E93C4C] WARN - improper ping interval (EM_PING_NOTIF_RESPONSE: BACKOFF::180000) 2011-10-27 15:56:18,472 [167:E3E93C4C] WARN - Ping protocol error o.s.gcagent.ping.PingProtocolException [OMS sent an invalid response: "BACKOFF::180000"] At least someone in Oracle has some humour when it comes to this :) For those who read all of this: have you seen that before? Any pointers appreciated.

Martin
--

http://www.linkedin.com/in/martincarstenbach http://martincarstenbach.wordpress.com
--

http://www.freelists.org/webpage/oracle-l Received on Thu Oct 27 2011 - 09:59:34 CDT

Original text of this message