Home » Server Options » RAC & Failsafe » Nodes Get Rebooted (Oracle 10g Enterprise Edition Release 10.2.0.1.0, Oracle Enterprise Linux 4)
Nodes Get Rebooted [message #404730] Sat, 23 May 2009 01:26 Go to next message
kumarrajnishgupta
Messages: 43
Registered: October 2008
Location: noida
Member

Dear Friends
I have followed this document "Build Your Own Oracle "Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI" by Jeffrey Hunter at site "http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html"
it installed perfectly fine, no problem at all, but the problem is my both nodes get rebooted acidently or creating any tablespace on it or runnig rman for taking backup. If more information required will be provide.
with regds
rajnish
Re: Nodes Get Rebooted [message #404732 is a reply to message #404730] Sat, 23 May 2009 01:40 Go to previous messageGo to next message
Mahesh Rajendran
Messages: 10626
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
What would the logs say?
Re: Nodes Get Rebooted [message #404735 is a reply to message #404732] Sat, 23 May 2009 01:47 Go to previous messageGo to next message
kumarrajnishgupta
Messages: 43
Registered: October 2008
Location: noida
Member

Dear sir,
May please tell us which log you required to help this troubleshoot i will share with you.
with regds
rajnish
Re: Nodes Get Rebooted [message #404756 is a reply to message #404735] Sat, 23 May 2009 06:54 Go to previous messageGo to next message
Mahesh Rajendran
Messages: 10626
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
alert<sid>.log, crs log ( inside crs home).
Just post the relevant contents.
Re: Nodes Get Rebooted [message #404901 is a reply to message #404756] Mon, 25 May 2009 06:15 Go to previous messageGo to next message
kumarrajnishgupta
Messages: 43
Registered: October 2008
Location: noida
Member

Dear sir,
At googling some one suggenstion to make increase the "Heartbeat dead threshold" in ocfs configuration i do about 500 seconds i think problem get resolved but still today my both nodes going hang I am sending log here

/u01/app/crs/log/linux1/alertlinux1.log

009-05-25 13:35:29.331
[cssd(8441)]CRS-1606:CSSD Insufficient voting files available [1 of 3]. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:24.895
[cssd(14622)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:24.949
[cssd(14622)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:24.949
[cssd(14622)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:50:28.682
[cssd(14622)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 linux2 .
2009-05-25 14:50:29.334
[crsd(6327)]CRS-1012:The OCR service started on node linux1.
2009-05-25 14:50:29.612
[evmd(14510)]CRS-1401:EVMD started on node linux1.
2009-05-25 14:50:33.724
[crsd(6327)]CRS-1201:CRSD started on node linux1.
2009-05-25 14:51:02.995
[cssd(14622)]CRS-1603:CSSD on node linux1 shutdown by user.
2009-05-25 14:56:05.799
[cssd(8711)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:56:05.883
[cssd(8711)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:56:05.929
[cssd(8711)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 14:56:07.013
[cssd(8711)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 .
2009-05-25 14:56:07.668
[crsd(7125)]CRS-1012:The OCR service started on node linux1.
2009-05-25 14:56:07.677
[evmd(8580)]CRS-1401:EVMD started on node linux1.
2009-05-25 14:56:11.616
[crsd(7125)]CRS-1201:CRSD started on node linux1.
2009-05-25 14:56:45.170
[cssd(8711)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 linux2 .
2009-05-25 15:17:10.338
[cssd(8709)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:17:10.338
[cssd(8709)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:17:10.338
[cssd(8709)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:20:22.769
[cssd(8709)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 .
2009-05-25 15:20:23.298
[crsd(7152)]CRS-1012:The OCR service started on node linux1.
2009-05-25 15:20:23.301
[evmd(8577)]CRS-1401:EVMD started on node linux1.
2009-05-25 15:20:27.259
[crsd(7152)]CRS-1201:CRSD started on node linux1.
2009-05-25 15:35:08.831
[cssd(8592)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror1. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:35:08.850
[cssd(8592)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:35:08.956
[cssd(8592)]CRS-1605:CSSD voting file is online: /u02/oradata/orcl/CSSFile_mirror2. Details in /u01/app/crs/log/linux1/cssd/ocssd.log.
2009-05-25 15:35:12.811
[cssd(8592)]CRS-1601:CSSD Reconfiguration complete. Active nodes are linux1 linux2 .
2009-05-25 15:35:13.621
[crsd(7181)]CRS-1012:The OCR service started on node linux1.
2009-05-25 15:35:13.632
[evmd(8450)]CRS-1401:EVMD started on node linux1.
2009-05-25 15:35:18.120
[crsd(7181)]CRS-1201:CRSD started on node linux1.
2009-05-25 15:49:00.400
[crsd(4379)]CRS-1012:The OCR service started on node linux1.
2009-05-25 15:49:11.361
[crsd(4379)]CRS-1201:CRSD started on node linux1.

crsd log

2009-05-25 15:47:56.795: [ CRSEVT][3770674080]0CAAMonitorHandler :: 0:Action Script /u01/app/crs/bin/racgwrap(check) timed out for ora.linux1.gsd! (timeout=600)
2009-05-25 15:47:56.795: [ CRSAPP][3770674080]0CheckResource error for ora.linux1.gsd error code = -2
2009-05-25 15:47:56.795: [ CRSEVT][3654159264]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/db_1/bin/racgwrap(check) timed out for ora.orcl.orcl1.inst! (timeout=600)
2009-05-25 15:47:56.795: [ CRSAPP][3654159264]0CheckResource error for ora.orcl.orcl1.inst error code = -2
2009-05-25 15:47:56.896: [ CRSEVT][3696118688]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/db_1/bin/racgwrap(check) timed out for ora.linux1.ASM1.asm! (timeout=600)
2009-05-25 15:47:56.896: [ CRSAPP][3696118688]0CheckResource error for ora.linux1.ASM1.asm error code = -2
2009-05-25 15:47:56.901: [ CRSEVT][3717098400]0CAAMonitorHandler :: 0:Action Script /u01/app/crs/bin/racgwrap(check) timed out for ora.linux1.vip! (timeout=60)
2009-05-25 15:47:56.902: [ CRSAPP][3717098400]0CheckResource error for ora.linux1.vip error code = -2
2009-05-25 15:47:56.953: [ CRSEVT][3781163936]0CAAMonitorHandler :: 0:Action Script /u01/app/crs/bin/racgwrap(check) timed out for ora.linux1.ons! (timeout=600)
2009-05-25 15:47:56.953: [ CRSAPP][3781163936]0CheckResource error for ora.linux1.ons error code = -2
2009-05-25 15:47:56.953: [ CRSEVT][3643669408]0CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/db_1/bin/racgwrap(check) timed out for ora.linux1.LISTENER_LINUX1.lsnr! (timeout=600)
2009-05-25 15:47:56.953: [ CRSAPP][3643669408]0CheckResource error for ora.linux1.LISTENER_LINUX1.lsnr error code = -2
2009-05-25 15:48:59.687: [ default][4143806144][ENTER]0
2009-05-25 15:49:11.361: [ CRSMAIN][4143806144]0Starting Threads
2009-05-25 15:49:11.361: [ CRSMAIN][4143806144]0CRS Daemon Started.
2009-05-25 16:08:06.537: [ OCRSRV][4098997152]th_select_handler: Failed to retrieve procctx from ht. constr = [145255720] retval lht [-27] Signal CV.
2009-05-25 16:18:09.476: [ OCRSRV][4098997152]th_select_handler: Failed to retrieve procctx from ht. constr = [145255720] retval lht [-27] Signal CV.

ocssd log

CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clscsendx: (0x8354460) Connection not active

[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clssgmSendClient: Send failed rc 6, con (0x8354460), client (0x8354660), proc ((nil))
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clscsendx: (0x83548a8) Connection not active

[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clssgmSendClient: Send failed rc 6, con (0x83548a8), client (0x8354aa8), proc ((nil))
[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clscsendx: (0x83543e8) Connection not active

evmd log

2009-05-23 15:34:27.907: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:28.910: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:29.912: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:30.915: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:31.917: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:32.920: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:33.922: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:34.925: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

2009-05-23 15:34:35.928: [ CSSCLNT][4143810784]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying

[ CSSD]2009-05-25 15:47:57.556 [4098169760] >TRACE: clssgmSendClient: Send failed rc 6, con (0x83543e8), client (0x8354f00), proc ((nil))
[ CSSD]2009-05-25 15:47:57.557 [4098169760] >TRACE: clscsendx: (0x83854b0) Connection not active

alert_orcl1.log

luster communication is configured to use the following interface(s) for this instance
192.168.2.100
Mon May 25 15:35:56 2009
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=10139
DIAG started with pid=3, OS id=10141
PSP0 started with pid=4, OS id=10143
LMON started with pid=5, OS id=10145
LMD0 started with pid=6, OS id=10147
LMS0 started with pid=7, OS id=10149
LMS1 started with pid=8, OS id=10159
MMAN started with pid=9, OS id=10169
DBW0 started with pid=10, OS id=10171
LGWR started with pid=11, OS id=10173
CKPT started with pid=12, OS id=10175
SMON started with pid=13, OS id=10177
RECO started with pid=14, OS id=10179
CJQ0 started with pid=15, OS id=10181
MMON started with pid=16, OS id=10183
Mon May 25 15:35:57 2009
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=17, OS id=10185
Mon May 25 15:35:57 2009
starting up 1 shared server(s) ...
Mon May 25 15:35:57 2009
lmon registered with NM - instance id 1 (internal mem no 0)
Mon May 25 15:35:57 2009
Reconfiguration started (old inc 0, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 not valid according to instance 1
* domain 0 valid = 0 according to instance 1
Mon May 25 15:35:58 2009
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon May 25 15:35:58 2009
LMS 0: 0 GCS shadows cancelled, 0 closed
Mon May 25 15:35:58 2009
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon May 25 15:35:58 2009
LMS 0: 0 GCS shadows traversed, 0 replayed
Mon May 25 15:35:58 2009
LMS 1: 0 GCS shadows traversed, 0 replayed
Mon May 25 15:35:58 2009
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=20, OS id=10241
Mon May 25 15:35:59 2009
ALTER DATABASE MOUNT
Mon May 25 15:36:07 2009
Starting background process ASMB
ASMB started with pid=22, OS id=10446
Starting background process RBAL
RBAL started with pid=23, OS id=10450
Mon May 25 15:36:15 2009
SUCCESS: diskgroup ORCL_DATA1 was mounted
SUCCESS: diskgroup FLASH_RECOVERY_AREA was mounted
Mon May 25 15:36:20 2009
Setting recovery target incarnation to 2
Mon May 25 15:36:22 2009
Successful mount of redo thread 1, with mount id 1215580665
Mon May 25 15:36:22 2009
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Mon May 25 15:36:23 2009
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Mon May 25 15:36:51 2009
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=32, OS id=11587
Mon May 25 15:36:51 2009
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=33, OS id=11589
Mon May 25 15:36:54 2009
Thread 1 opened at log sequence 8
Current log# 1 seq# 8 mem# 0: +ORCL_DATA1/orcl/onlinelog/group_1.261.687634095
Current log# 1 seq# 8 mem# 1: +FLASH_RECOVERY_AREA/orcl/onlinelog/group_1.258.687634107
Successful open of redo thread 1
Mon May 25 15:36:54 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 25 15:36:55 2009
ARC0: STARTING ARCH PROCESSES
Mon May 25 15:36:55 2009
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Mon May 25 15:36:55 2009
SMON: enabling cache recovery
Mon May 25 15:36:55 2009
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARC2 started with pid=34, OS id=11666
Mon May 25 15:37:05 2009
Successfully onlined Undo Tablespace 1.
Mon May 25 15:37:05 2009
SMON: enabling tx recovery
Mon May 25 15:37:05 2009
Database Characterset is WE8ISO8859P1
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=36, OS id=11970
Mon May 25 15:37:16 2009
Completed: ALTER DATABASE OPEN
Mon May 25 15:37:23 2009
ALTER SYSTEM SET service_names='orcl.idevelopment.info',' orcl_taf.idevelopment.info','orcl_taf' SCOPE=MEMORY SID='orcl1';
Mon May 25 15:42:51 2009
Shutting down archive processes
Mon May 25 15:42:56 2009
ARCH shutting down
ARC2: Archival stopped

Re: Nodes Get Rebooted [message #405259 is a reply to message #404901] Wed, 27 May 2009 04:14 Go to previous messageGo to next message
Mahesh Rajendran
Messages: 10626
Registered: March 2002
Location: oracleDocoVille
Senior Member
Account Moderator
This seems to be an unpublished internal Bug(# 5507883) and issue is been reportedly fixed in higher patchsets.
Re: Nodes Get Rebooted [message #405912 is a reply to message #405259] Sun, 31 May 2009 11:28 Go to previous messageGo to next message
kumarrajnishgupta
Messages: 43
Registered: October 2008
Location: noida
Member

Dear sir,
As i found this problem was due to ocfs2 with 2 nodes. what i have done to solve out. Remove ocfs2 and use raw devices for voting disk and ocr now its working fine. thank you for your great support.
with regds
rajnish
Re: Nodes Get Rebooted [message #406013 is a reply to message #405912] Mon, 01 June 2009 11:45 Go to previous messageGo to next message
babuknb
Messages: 1731
Registered: December 2005
Location: NJ
Senior Member

Can you share with us your OCFS2 version.

Thanks
Re: Nodes Get Rebooted [message #406228 is a reply to message #406013] Tue, 02 June 2009 22:57 Go to previous message
kumarrajnishgupta
Messages: 43
Registered: October 2008
Location: noida
Member

hi,
This is the RPM i was using for OCFS2
ocfs2-2.6.9-78.0.0.0.1.ELhugemem-1.2.9-1.el4
ocfs2console-1.2.7-1.el4
ocfs2-tools-devel-1.2.7-1.el4
ocfs2-2.6.9-78.0.0.0.1.EL-1.2.9-1.el4
ocfs2-2.6.9-78.0.0.0.1.ELxenU-1.2.9-1.el4
ocfs2-tools-1.2.7-1.el4
ocfs2-2.6.9-78.0.0.0.1.ELsmp-1.2.9-1.el4
with regds
rajnish
Previous Topic: RAC Performance Problem - Hang
Next Topic: Upgrade from 10.2.0.1.0 to 10.2.0.4.0
Goto Forum:
  


Current Time: Fri Sep 19 17:23:49 CDT 2014

Total time taken to generate the page: 0.10926 seconds