RE: 11.2.0.4 RAC ora.ons and ora.oc4j resources down

From: Stephens, Chris <Chris.Stephens_at_adm.com>
Date: Mon, 4 Nov 2013 13:26:12 -0600
Message-ID: <D95BD5AFADBB0F4E9BB6C53F14D3A05006BBE7BCB8_at_JRCEXC1V1.research.na.admworld.com>



Those files hadn't been modified but after finding the following in the $GRID_HOME/opmn/logs/ons.log.admoract1n1 logfile:

[2013-10-30T11:00:07-05:00] [ons] [NOTIFICATION:1] [104] [ons-internal] ONS server initiated
[2013-10-30T11:00:07-05:00] [ons] [ERROR:1] [17] [ons-listener] 10.7.38.123,6100: BIND (Cannot assign requested address)

and noticing:

[grid_at_admoract1n2 logs]$ ping localhost
PING localhost.na.admworld.com (10.7.38.123) 56(84) bytes of data.

I discovered someone added a DNS entry last Wednesday for "localhost". Ugh. After removing that, ora.ons and ora.oc4j ONLINE'd no problem.

I've switched the order of "hosts" in /etc/nsswitch.conf from "dns files nis" to "files dns nis" so this won't get us again.

Thanks for everyone's help!

From: Radoulov, Dimitre [mailto:cichomitiko_at_gmail.com] Sent: Saturday, November 02, 2013 6:27 AM To: Stephens, Chris
Cc: oracle-l_at_freelists.org
Subject: Re: 11.2.0.4 RAC ora.ons and ora.oc4j resources down

Hi Chris,

On 01/11/2013 16:20, Stephens, Chris wrote: Been through there. The only thing I recognize is a circular reference to itself.

CRS-5017: The resource action "ora.ons clean" encountered the following error: (:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00106:)" in "/u01/app/11.2.0.4/grid/log/admoract1n2/agent/crsd/oraagent_grid/oraagent_grid.log".

2013-11-01 08:34:39.847: [ora.ons][1059063552]{1:28227:4343} [clean] (:CLSN00106:) clsn_agent::clean }
2013-11-01 08:34:39.848: [    AGFW][1061164800]{1:28227:4343} Agent sending reply for: RESOURCE_CLEAN[ora.ons admoract1n2 1] ID 4100:16936
2013-11-01 08:34:43.802: [ora.ons][1050851072]{1:28227:4343} [clean] got lock
2013-11-01 08:34:43.802: [ora.ons][1050851072]{1:28227:4343} [clean] tryActionLock }
2013-11-01 08:34:43.802: [ora.ons][1050851072]{1:28227:4343} [clean] abort  }
2013-11-01 08:34:43.802: [ora.ons][1050851072]{1:28227:4343} [clean] (:CLSN00110:) clsn_agent::abort }
2013-11-01 08:34:43.802: [    AGFW][1050851072]{1:28227:4343} Command: clean for resource: ora.ons admoract1n2 1 completed with status: TIMEDOUT

[...]

you could check if /etc/hosts or /etc/nsswitch.conf have been modified recently and if localhost is correctly defined to point to 127.0.0.1.

Regards
Dimitre

CONFIDENTIALITY NOTICE:
This message is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email reply.

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 04 2013 - 20:26:12 CET

Original text of this message