help me find out why instance died

From: chao_ping <member_at_dbforums.com>
Date: Sat, 21 Dec 2002 12:34:35 +0000
Message-ID: <2308742.1040474075_at_dbforums.com>

Hi,
I have a rac 9.2.0.2 Database running on RedHat AS2.1,it has been up and running for about 4 monthes(One month ago, i patched it to 9202). Last night, one instance died unexpectedly,while another instance still running.Though not much business is affected, I want to know why it died, But i am unable to find it out, so looking for your help. Here is some information,
I tested the interconnect and the service Network card, both running fine(at 7:00 am), and the disk system is also ok.

Alert log file:

> :
> ----------------------------------------------------------------------
> -----------
>
> Fri Dec 20 23:38:24 2002
> Thread 2 advanced to log sequence 265
> Current log# 3 seq# 265 mem# 0: /dev/raw/raw8
> Sat Dec 21 04:04:29 2002
> Errors in file /home/oracle/admin/rac/bdump/rac2_lmon_1634.trc:
> ORA-29740: evicted by member 0, group incarnation 7
> Sat Dec 21 04:04:29 2002
> LMON: terminating instance due to error 29740
> Sat Dec 21 04:04:31 2002
> Trace dumping is performing id=[cdmp_20021221040431]
> Sat Dec 21 04:04:34 2002
> Instance terminated by LMON, pid = 1634
> Sat Dec 21 07:38:36 2002
> Starting ORACLE instance (normal)
> Sat Dec 21 07:38:36 2002
>

>
> [oracle_at_rac2 bdump]$ cat
> /home/oracle/admin/rac/bdump/rac2_lmon_1634.trc
> /home/oracle/admin/rac/bdump/rac2_lmon_1634.trc
> Oracle9i Enterprise Edition Release 9.2.0.2.0 - Production
> With the Partitioning, Real Application Clusters, OLAP and Oracle Data
> Mining options
> JServer Release 9.2.0.2.0 - Production
> ORACLE_HOME = /home/oracle/9.2.0
> System name: Linux
> Node name: rac2
> Release: 2.4.9-e.3smp
> Version: #1 SMP Fri May 3 16:48:54 EDT 2002
> Machine: i686
> Instance name: rac2
> Redo thread mounted by this instance: 0
> Oracle process number: 4
> Unix process pid: 1634, image: oracle_at_rac2 (LMON)
>
> *** SESSION ID:(3.1) 2002-11-22 03:29:38.649
> Batch msg size = 2048
> Batching factor: enqueue replay 48, ack 53
> Batching factor: cache replay 34 size per lock 56
> kjxggin: receive buffer size = 32768
> kjxgmin: SKGXN ver (2 1 Oracle 9i Reference CM)
> CMCLI WARNING: CMInitContext: init ctx(0xacc37e8)
> *** 2002-11-22 03:29:42.243
> kjxgmrcfg: Reconfiguration started, reason 1
> kjxgmcs: Setting state to 0 0.
> *** 2002-11-22 03:29:42.243
> Name Service frozen
> kjxgmcs: Setting state to 0 1.
> kjfcpiora: publish my weight 152022
> kjxgmps: proposing substate 2
> kjxgmcs: Setting state to 6 2.
> Performed the unique instance identification check
> kjxgmps: proposing substate 3
> kjxgmcs: Setting state to 6 3.
> Name Service recovery started
> Deleted all dead-instance name entries
> kjxgmps: proposing substate 4
> kjxgmcs: Setting state to 6 4.
> Multicasted all local name entries for publish
> Replayed all pending requests
> kjxgmps: proposing substate 5
> kjxgmcs: Setting state to 6 5.
> Name Service normal
> Name Service recovery done
> *** 2002-11-22 03:29:43.397
> kjxgmps: proposing substate 6
> kjxgmcs: Setting state to 6 6.
> *** 2002-11-22 03:29:43.507
> *** 2002-11-22 03:29:43.508
> Reconfiguration started
> Synchronization timeout interval: 660 sec
> List of nodes: 0,1,
> Global Resource Directory frozen
> node 0
> release 9 2 0 2
> node 1
> release 9 2 0 2
> res_master_weight for node 0 is 152022
> res_master_weight for node 1 is 152022
> Total master weight = 304044
> Dead inst
> Join inst 0 1
> Exist inst
> Active Sendback Threshold = 50 %
> Communication channels reestablished
> Master broadcasted resource hash value bitmaps
> Non-local Process blocks cleaned out
> Resources and enqueues cleaned out
> Resources remastered 0
> GCS shadows traversed, 0 cancelled, 0 closed
> GCS resources traversed, 0 cancelled
> set master node info
> Submitted all remote-enqueue requests
> kjfcrfg: Number of mesgs sent to node 0 = 0
> Update rdomain variables
> Dwn-cvts replayed, VALBLKs dubious
> All grantable enqueues granted
> *** 2002-11-22 03:29:43.868
> GCS shadows traversed, 0 replayed, 0 unopened
> Submitted all GCS cache requests
> write requests issued in 887 GCS resources
> PIs marked suspect, 0 flush PI msgs
> *** 2002-11-22 03:29:44.116
> Reconfiguration complete
> *** 2002-11-22 03:29:51.261
> kjxgrtmc2: Member 1 thread 2 mounted
> *** 2002-12-21 04:02:05.645
> kjxgrgetresults: Detect reconfig from 0, seq 6, reason 2
> *** 2002-12-21 04:01:57.014
> kjxgrrcfgchk: Initiating reconfig, reason 2
> *** 2002-12-21 04:01:57.014
> kjxgmrcfg: Reconfiguration started, reason 2
> kjxgmcs: Setting state to 6 0.
> *** 2002-12-21 04:01:57.021
> Name Service frozen
> kjxgmcs: Setting state to 6 1.
> *** 2002-12-21 04:04:29.911
> kjxgrdtrt: Evicted by 0, seq (7, 6)
> error 29740 detected in background process
> ORA-29740: evicted by member 0, group incarnation 7
> ksuitm: waiting for [5] seconds before killing DIAG

--
Posted via http://dbforums.com

Received on Sat Dec 21 2002 - 13:34:35 CET

This message: [ Message body ]
Next message: Tim Cuthbertson: "Re: help me find out why instance died"
Previous message: Lee Helm: "Re: Urgent:: Use of COM/COM+ and XML data to interface Oracle DB, best practices"
Next in thread: Tim Cuthbertson: "Re: help me find out why instance died"
Reply: Tim Cuthbertson: "Re: help me find out why instance died"
Reply: chao_ping: "Re: help me find out why instance died"
Reply: chao_ping: "Re: help me find out why instance died"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message