Re: ctssd not running in observer mode in node 2

From: Balwanth B <balwanthdba_at_gmail.com>
Date: Wed, 15 Nov 2017 18:53:55 -0500
Message-ID: <CAL72EnAZ9SkOxTXyaj6kw-M6if-_gpmu8GC9WnMmoZn1dWLW9g_at_mail.gmail.com>



I see this from alert.log

*2017-06-29 10:47:28.377 [CRSD(1242)]RP-1201: CRSD started on node rstnvahdmdb02.*
*2017-06-29 10:47:28.871 [ORAAGENT(1386)]RP-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 1386* *2017-06-29 10:47:28.995 [ORAROOTAGENT(1409)]RP-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 1409* *2017-07-11 04:36:37.046 [OCTSSD(54800)]RP-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 54800* *2017-07-11 04:36:37.961 [OCTSSD(54800)]RP-2403: The Cluster Time Synchronization Service on host rstnvahdmdb02 is in observer mode.* *2017-07-11 04:36:38.203 [OCTSSD(54800)]RP-8504: Oracle Clusterware OCTSSD process with operating system process ID 54800 is exiting* *2017-07-11 04:36:38.963 [OHASD(837)]RP-2878: Failed to restart resource 'ora.ctssd'*
*2017-07-14 12:15:59.835 [ORAAGENT(18443)]RP-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 18443* 2017-11-14 04:00:34.704 [OCSSD(934)]BB-1612: Network communication with node rstnvahdmdb01 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.735 seconds 2017-11-14 04:00:42.705 [OCSSD(934)]BB-1611: Network communication with node rstnvahdmdb01 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.733 seconds
2017-11-14 04:00:46.706 [OCSSD(934)]BB-1610: Network communication with node rstnvahdmdb01 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.733 seconds
2017-11-14 04:00:47.702 [OHASD(837)]BB-8011: reboot advisory message from host: rstnvahdmdb01, component: cssagent, with time stamp: L-2017-11-14-04:00:47.630
2017-11-14 04:00:47.702 [OHASD(837)]BB-8013: reboot advisory message text: Rebooting HUB node after limit 28137 exceeded; disk timeout 27856, network timeout 28137, last heartbeat from CSSD at epoch seconds 1510650019.440, 28189 milliseconds ago based on invariant clock value of 111401425 2017-11-14 04:00:49.440 [OCSSD(934)]BB-1632: Node rstnvahdmdb01 is being removed from the cluster in cluster incarnation 390330971 2017-11-14 04:00:49.466 [OCSSD(934)]BB-1601: CSSD Reconfiguration complete. Active nodes are rstnvahdmdb02 .
2017-11-14 04:00:50.704 [CRSD(1242)]BB-5504: Node down event reported for node 'rstnvahdmdb01'.
2017-11-14 04:00:51.499 [ORAAGENT(41818)]BB-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 41818 2017-11-14 04:00:52.886 [CRSCTL(41956)]BB-4743: File /u01/app/12.1.0/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was updated from OCR(Size: 13365(New), 13378(Old) bytes) 2017-11-14 04:03:06.228 [OLOGGERD(55127)]BB-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 55127 2017-11-14 04:05:20.342 [SRVM(42611)]BB-10051: CVU found following errors with Clusterware setup : PRVG-1453 : Oracle CTSS resource is not in ONLINE state on nodes "rstnvahdmdb02"
PRCW-1015 : Wallet hdmresdb does not exist. CLSW-9: The cluster wallet to be operated on does not exist. :[1015] Node "rstnvahdmdb01" is not reachable
PRKN-1035 : Host "rstnvahdmdb01" is unreachable

2017-11-14 04:05:29.441 [CRSD(1242)]BB-2773: Server 'rstnvahdmdb01' has been removed from pool 'ora.HDMRESDB'.
2017-11-14 04:05:29.441 [CRSD(1242)]BB-2773: Server 'rstnvahdmdb01' has been removed from pool 'Generic'.
2017-11-14 04:09:10.249 [OCSSD(934)]BB-1601: CSSD Reconfiguration complete. Active nodes are rstnvahdmdb01 rstnvahdmdb02 . 2017-11-14 04:09:53.588 [CRSD(1242)]BB-2772: Server 'rstnvahdmdb01' has been assigned to pool 'Generic'.
2017-11-14 04:09:53.591 [CRSD(1242)]BB-2772: Server 'rstnvahdmdb01' has been assigned to pool 'ora.HDMRESDB'.
2017-11-14 10:07:38.509 [SRVM(18542)]BB-10051: CVU found following errors with Clusterware setup : PRVG-1453 : Oracle CTSS resource is not in ONLINE state on nodes "rstnvahdmdb02"

PRCW-1015 : Wallet hdmresdb does not exist.
CLSW-9: The cluster wallet to be operated on does not exist. :[1015]
PRCW-1015 : Wallet hdmresdb does not exist.

On Wed, Nov 15, 2017 at 6:40 PM, Balwanth B <balwanthdba_at_gmail.com> wrote:

> Hello,
>
> we see below error fro cluster alert.log. Cluster is running fine but,
> we might see alerts. We did not see this issue before, we recently got
> failed over from node 1 to node 2.
>
> uname -a
> SunOS rstnvahdmdb02 5.11 11.3 sun4v sparc sun4v
>
> *Error seen*
>
> 2017-11-15 16:07:42.317 [SRVM(43764)]CRS-10051: CVU found following errors
> with Clusterware setup : PRVF-5408 : NTP Time Server "192.5.41.41" is
> common only to the following nodes "rstnvahdmdb02"
> PRVF-5408 : NTP Time Server "192.5.41.209" is common only to the following
> nodes "rstnvahdmdb01"
> PRVF-5416 : Query of NTP daemon failed on all nodes
> PRVG-1453 : Oracle CTSS resource is not in ONLINE state on nodes
> "rstnvahdmdb02"
> PRCW-1015 : Wallet hdmresdb does not exist.
> CLSW-9: The cluster wallet to be operated on does not exist. :[1015]
> PRCW-1015 : Wallet hdmresdb does not exist.
> CLSW-9: The cluster wallet to be operated on does not exist. :[1015]
>
> *related info *
>
>
> [grid_at_rstnvahdmdb02]$ /u01/app/12.1.0/grid/bin/cluvfy comp clocksync
> -noctss -n rstnvahdmdb01,rstnvahdmdb02 -verbose
>
>
> Verifying Clock Synchronization across the cluster nodes
>
> Starting Clock synchronization checks using Network Time Protocol(NTP)...
>
> Checking existence of NTP configuration file "/etc/inet/ntp.conf" across
> nodes
> Node Name File exists?
> ------------------------------------ ------------------------
> rstnvahdmdb02 yes
> rstnvahdmdb01 yes
> The NTP configuration file "/etc/inet/ntp.conf" is available on all nodes
> NTP configuration file "/etc/inet/ntp.conf" existence check passed
>
> Checking daemon liveness...
>
> Check: Liveness for "ntpd"
> Node Name Running?
> ------------------------------------ ------------------------
> rstnvahdmdb02 yes
> rstnvahdmdb01 yes
> Result: Liveness check passed for "ntpd"
> Check for NTP daemon or service alive passed on all nodes
>
> Checking whether NTP daemon or service is using UDP port 123 on all nodes
>
> Check for NTP daemon or service using UDP port 123
> Node Name Port Open?
> ------------------------------------ ------------------------
> rstnvahdmdb02 yes
> rstnvahdmdb01 yes
>
> NTP common Time Server Check started...
> NTP Time Server "192.5.41.209" is common to all nodes on which the NTP
> daemon is running
> Check of common NTP Time Server passed
>
> Clock time offset check from NTP Time Server started...
> Checking on nodes "[rstnvahdmdb02, rstnvahdmdb01]"...
> Check: Clock time offset from NTP Time Server
>
> Time Server: 192.5.41.209
> Time Offset Limit: 1000.0 msecs
> Node Name Time Offset Status
> ------------ ------------------------ ------------------------
> rstnvahdmdb02 -9.794 passed
> rstnvahdmdb01 14.599 passed
> Time Server "192.5.41.209" has time offsets that are within permissible
> limits for nodes "[rstnvahdmdb02, rstnvahdmdb01]".
> Clock time offset check passed
>
> Result: Clock synchronization check using Network Time Protocol(NTP) passed
>
>
> Oracle Cluster Time Synchronization Services check passed
>
> Verification of Clock Synchronization across the cluster nodes was
> successful.
>
> From Node 1
>
>
> root_at_rstnvahdmdb01:~# ps -ef|grep -i ctssd
> root 2577 1 0 Nov 14 ? 10:37
> /u01/app/12.1.0/grid/bin/octssd.bin reboot
> root 52678 39847 0 17:27:01 pts/2 0:00 grep -i ctssd
> root_at_rstnvahdmdb01:~# ps -ef|grep -i ntp
> root 668 1 0 Nov 14 ? 0:09 /usr/lib/inet/ntpd
> --pidfile /var/run/ntp.pid --panicgate
> root 52850 39847 0 17:27:07 pts/2 0:00 grep -i ntp
>
> From Node 2
>
> root_at_rstnvahdmdb02:~# ps -ef|grep -i ntp
> root 606 1 0 May 16 ? 18:41 /usr/lib/inet/ntpd
> --pidfile /var/run/ntp.pid --panicgate --logfile /var/ntp/ntp.log
> root 21578 57373 0 17:27:15 pts/3 0:00 grep -i ntp
>
> root_at_rstnvahdmdb02:~# ps -ef|grep ctssd
> root 33426 57373 0 17:56:52 pts/3 0:00 grep ctssd
>
>
> from node 1
>
> root_at_rstnvahdmdb01:~# /usr/sbin/ntpq -pn
> remote refid st t when poll reach delay offset
> jitter
> ============================================================
> ==================
> +206.46.245.7 192.5.41.209 2 u 446 1024 377 69.957 14.599
> 56.949
> *206.46.245.6 *192.5.41.40* 2 u 247 1024 377 40.694 10.985
> 35.162
>
>
> from node 2
>
> root_at_rstnvahdmdb02:~# /usr/sbin/ntpq -pn
> remote refid st t when poll reach delay offset
> jitter
> ============================================================
> ==================
> *206.46.245.6 *192.5.41.41* 2 u 115 1024 377 54.859 15.140
> 36.879
> +206.46.245.7 192.5.41.209 2 u 22 1024 377 73.357 -9.782
> 20.544
>
> Node 1
> ora.ctssd
> 1 ONLINE ONLINE rstnvahdmdb01
> OBSERVER,STABLE
>
> Node 2
>
> ora.ctssd
> 1 ONLINE OFFLINE STABLE
>
> Thanks,
>
> Balwanth
>
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 16 2017 - 00:53:55 CET

Original text of this message