Home » Server Options » RAC & Failsafe » [check] Error = error 11 encountered when sending messages to CRSD (11.2.0.1 SLES SP3)
[check] Error = error 11 encountered when sending messages to CRSD [message #617633] Wed, 02 July 2014 06:59 Go to next message
juniordbanewbie
Messages: 14
Registered: April 2014
Junior Member
I've been trying to figure out why my crsd daemon has not been able to start.


grid@ORAC02:~> crsctl check cluster
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
grid@ORAC02:~>



grid@ORAC02:~> crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       orac02                   Started
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       orac02
ora.cssdmonitor
      1        ONLINE  ONLINE       orac02
ora.ctssd
      1        ONLINE  ONLINE       orac02                   OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       orac02
ora.evmd
      1        ONLINE  ONLINE       orac02
ora.gipcd
      1        ONLINE  ONLINE       orac02
ora.gpnpd
      1        ONLINE  ONLINE       orac02
ora.mdnsd
      1        ONLINE  ONLINE       orac02


I've follow both MOS Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1) and 11gR2 Clusterware and Grid Home - What You Need to Know (Doc ID 1053147.1)

I mostly like stuck at Level 1 OHASD rootagent since level 3 CRSD is not spawned. Please correct me if I'm wrong.

when I look at $GRID_HOME/log/orac02/agent/ohasd/orarootagent_root/orarootagent_root.log

I found the following that could be the problem


 374 2014-07-02 18:40:16.350: [ora.crsd][2944395008] [start] PID will be looked for in /u01/app/11.2.0/grid/crs/init/orac02.pid
 375 2014-07-02 18:40:16.350: [ora.crsd][2944395008] [start] PID which will be monitored will be 11841
 376 2014-07-02 18:40:16.350: [ora.crsd][2944395008] [start] }DaemonAgent::start
 377 2014-07-02 18:40:16.351: [ora.crsd][2944395008] [start] clsn_agent::start }
 378 2014-07-02 18:40:16.351: [    AGFW][2944395008] Command: start for resource: ora.crsd 1 1 completed with status: SUCCESS
 379 2014-07-02 18:40:16.351: [    AGFW][2944395008] Executing command: check for resource: ora.crsd 1 1
 380 2014-07-02 18:40:16.352: [    AGFW][2927609600] Agent sending reply for: RESOURCE_START[ora.crsd 1 1] ID 4098:629
 381 2014-07-02 18:40:16.354: [ora.crsd][2944395008] [check] clsdmc_respget return: status=0, ecode=ffffff
 382 2014-07-02 18:40:16.354: [ USRTHRD][2944395008] Thread:[DaemonCheck:crsd]start {
 383 2014-07-02 18:40:16.354: [ USRTHRD][2944395008] Thread:[DaemonCheck:crsd]start }
 384 2014-07-02 18:40:16.354: [ora.crsd][2944395008] [check] DaemonAgent::check returned 0
 385 2014-07-02 18:40:16.355: [    AGFW][2944395008] check for resource: ora.crsd 1 1 completed with status: PARTIAL
 386 2014-07-02 18:40:16.355: [    AGFW][2927609600] ora.crsd 1 1 state changed from: STARTING to: PARTIAL
 387 2014-07-02 18:40:16.355: [    AGFW][2927609600] Started implicit monitor for:ora.crsd 1 1
 388 2014-07-02 18:40:16.355: [    AGFW][2927609600] Agent sending last reply for: RESOURCE_START[ora.crsd 1 1] ID 4098:629
 389 2014-07-02 18:40:19.416: [ USRTHRD][2793391872] Thread:[DaemonCheck:crsd]Thread exiting
 390 2014-07-02 18:40:19.416: [ USRTHRD][2793391872] Thread:[DaemonCheck:crsd]Initiating a check action
 391 2014-07-02 18:40:19.417: [ USRTHRD][2793391872] Check action requested by agent etnry point for ora.crsd
 392 2014-07-02 18:40:19.417: [    AGFW][2927609600] Agent received the message: RESOURCE_PROBE[ora.crsd 1 1] ID 4097:81
 393 2014-07-02 18:40:19.417: [    AGFW][2927609600] Preparing CHECK command for: ora.crsd 1 1
 394 2014-07-02 18:40:19.417: [CLSFRAME][3030554368] TM [MultiThread] is changing desired thread # to 3. Current # is 2
 395 2014-07-02 18:40:19.417: [    AGFW][2944395008] Executing command: check for resource: ora.crsd 1 1
 396 2014-07-02 18:40:19.418: [ COMMCRS][2944395008]clscsendx: (0x7f4fb00453d0) Physical connection (0x7f4fb0044f30) not active
 397
 398 [  clsdmc][2944395008]Failed to send meta message to connection [(ADDRESS=(PROTOCOL=ipc)(KEY=orac02DBG_CRSD))][11]
 399 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Error = error 11 encountered when sending messages to CRSD
 400 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Calling PID check for daemon
 401 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Trying to check PID = 11841
 402 2014-07-02 18:40:19.418: [ COMMCRS][2944395008]clscsendx: (0x7f4fb00453d0) Connection not active
 403
 404 [  clsdmc][2944395008]Failed to send meta message to connection [(ADDRESS=(PROTOCOL=ipc)(KEY=orac02DBG_CRSD))][6]
 405 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Error = error 6 encountered when sending messages to CRSD
 406 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] DaemonAgent::check returned 1
 407 2014-07-02 18:40:19.418: [    AGFW][2944395008] check for resource: ora.crsd 1 1 completed with status: OFFLINE


I've done a pid check


grid@ORAC02:~> cat /u01/app/11.2.0/grid/crs/init/orac02.pid
14270
grid@ORAC02:~> ps -ef | grep 14270
grid     27210 17918  0 19:40 pts/1    00:00:00 grep 14270



there is process with pid 14270. Also if you observe, the pid I obtained is different from what is reflected in orarootagent_root.log

neither could I found any crsd pid 11841 in the os

grid@ORAC02:~> ps -ef | grep 11841
grid     27913 17918  0 19:43 pts/1    00:00:00 grep 11841




I also did a asm cluvfy check

grid@ORAC02:~> cluvfy comp asm -n orac02 -verbose

Verifying ASM Integrity

Task ASM Integrity check started...

Starting check to see if ASM is running on all cluster nodes...
PRVF-5137 : Failure while checking ASM status on node "orac02"

Starting Disk Groups check to see if at least one Disk Group configured...
PRVF-5112 : An Exception occurred while checking for Disk Groups
PRVF-5114 : Disk Group check failed. No Disk Groups configured

Task ASM Integrity check failed...

Verification of ASM Integrity was unsuccessful on all the specified nodes.



the above error is something i expected since aocording to 1053147.1 crsd will spawn oraagent which will in turn spawn ASM Resouce - ASM Instance(s) resource

however what I found puzzling is that

the +ASM2 instance is not only on, both ocr and ocr mirror diskgroup are mounted.


SYS@+ASM2>SELECT name, state FROM v$asm_diskgroup ORDER BY name;

NAME                           STATE
------------------------------ -----------
DATA                           DISMOUNTED
FRA                            DISMOUNTED
OCR_VOTE                       MOUNTED
OCR_VOTE_MIRROR                MOUNTED



What also I do not understand is that on one hand, cluvfy says that asm instance has error, but on the other hand sqlplus tell there's no error. Why the contradictions?

Most important of all what should i execute the make sure crsd is started?


thanks a lot!
Re: [check] Error = error 11 encountered when sending messages to CRSD [message #617676 is a reply to message #617633] Wed, 02 July 2014 11:59 Go to previous messageGo to next message
babuknb
Messages: 1731
Registered: December 2005
Location: NJ
Senior Member

Please post below command output..

ps -ef | grep d.bin
$GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init


Any information in CRSD & OS logs ?

- Babu
Re: [check] Error = error 11 encountered when sending messages to CRSD [message #617707 is a reply to message #617676] Wed, 02 July 2014 18:58 Go to previous messageGo to next message
juniordbanewbie
Messages: 14
Registered: April 2014
Junior Member

grid@ORAC02:~> ps -ef | grep d.bin
root      8928     1  2 Jul02 ?        00:17:32 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid      9074     1  0 Jul02 ?        00:02:38 /u01/app/11.2.0/grid/bin/oraagent.bin
grid      9090     1  0 Jul02 ?        00:00:01 /u01/app/11.2.0/grid/bin/gipcd.bin
grid      9095     1  0 Jul02 ?        00:00:01 /u01/app/11.2.0/grid/bin/mdnsd.bin
grid      9118     1  0 Jul02 ?        00:02:08 /u01/app/11.2.0/grid/bin/gpnpd.bin
root      9136     1  0 Jul02 ?        00:02:42 /u01/app/11.2.0/grid/bin/cssdmonitor
root      9160     1  0 Jul02 ?        00:02:41 /u01/app/11.2.0/grid/bin/cssdagent
grid      9200     1  1 Jul02 ?        00:08:53 /u01/app/11.2.0/grid/bin/ocssd.bin
root      9219     1  0 Jul02 ?        00:00:18 /u01/app/11.2.0/grid/bin/orarootagent.bin
grid      9235     1  0 Jul02 ?        00:02:02 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f
root      9973     1  0 Jul02 ?        00:00:05 /u01/app/11.2.0/grid/bin/octssd.bin reboot
grid     10005     1  0 Jul02 ?        00:03:52 /u01/app/11.2.0/grid/bin/evmd.bin
grid     10754     1  0 Jul02 ?        00:01:37 /u01/app/11.2.0/grid/bin/oclskd.bin
grid     12046 10005  0 Jul02 ?        00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log
grid     25928 25871  0 07:52 pts/0    00:00:00 grep d.bin
grid@ORAC02:~> $GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init
CRS-2613: Could not find resource 'ora.cluster_interconnect.haip'.
grid@ORAC02:~>

I could not found anything unusual in crsd.log
Re: [check] Error = error 11 encountered when sending messages to CRSD [message #617711 is a reply to message #617707] Wed, 02 July 2014 21:09 Go to previous messageGo to next message
babuknb
Messages: 1731
Registered: December 2005
Location: NJ
Senior Member

Thanks.. For troubleshooting.. Have you tried to STOP cluster in both node and start only ORAC02 ? to validate the CRSD..
Re: [check] Error = error 11 encountered when sending messages to CRSD [message #617712 is a reply to message #617711] Wed, 02 July 2014 21:39 Go to previous messageGo to next message
juniordbanewbie
Messages: 14
Registered: April 2014
Junior Member
Yes I have stop both nodes.

Start node1 when node2 is completely down. I have done various cluvy and srvct tests on orac01, none of them fail.

but the same cannot be said for the situation when node2 is up and node1 is down.

only a few cluvy and srvctl test on orac02 pass, the rest fail.
Re: [check] Error = error 11 encountered when sending messages to CRSD [message #617713 is a reply to message #617712] Wed, 02 July 2014 22:05 Go to previous message
babuknb
Messages: 1731
Registered: December 2005
Location: NJ
Senior Member


394 2014-07-02 18:40:19.417: [CLSFRAME][3030554368] TM [MultiThread] is changing desired thread # to 3. Current # is 2
 395 2014-07-02 18:40:19.417: [    AGFW][2944395008] Executing command: check for resource: ora.crsd 1 1
 396 2014-07-02 18:40:19.418: [ COMMCRS][2944395008]clscsendx: (0x7f4fb00453d0) Physical connection (0x7f4fb0044f30) not active  -----------------> connection error
 397
 398 [  clsdmc][2944395008]Failed to send meta message to connection [(ADDRESS=(PROTOCOL=ipc)(KEY=orac02DBG_CRSD))][11]
 399 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Error = error 11 encountered when sending messages to CRSD
 400 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Calling PID check for daemon
 401 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Trying to check PID = 11841
 402 2014-07-02 18:40:19.418: [ COMMCRS][2944395008]clscsendx: (0x7f4fb00453d0) Connection not active   -------------------> Since connection not active. its failed
 403
 404 [  clsdmc][2944395008]Failed to send meta message to connection [(ADDRESS=(PROTOCOL=ipc)(KEY=orac02DBG_CRSD))][6]
 405 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] Error = error 6 encountered when sending messages to CRSD ----------------> CRSD is down becuase of OFFLINE
 406 2014-07-02 18:40:19.418: [ora.crsd][2944395008] [check] DaemonAgent::check returned 1
 407 2014-07-02 18:40:19.418: [    AGFW][2944395008] check for resource: ora.crsd 1 1 completed with status: OFFLINE



Ok. By looking logs again. Its mostly failed due to connection error. When connection was NOT active CRSD goes to offline and then failed.. As you said when you do the sqlplus its working fine but not from cluvfy.

NOW can you do the validation in network setting ? I mean network availability from cluvfy in both nodes ? especially from node B. Also please validate DATA & FRA ASM lun or storage disk availability in node B.
Previous Topic: Why does Voting File is different in different nodes
Next Topic: Server identification
Goto Forum:
  


Current Time: Mon Oct 20 22:44:34 CDT 2014

Total time taken to generate the page: 0.06516 seconds