Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Need Expert Help - dbca fails (ORA-27047) on raw vol. 9201 SuSE SLES8
Hi,
I have done this many times on Aix, Solaris, Relient Unix, HP and
Linux
almost all the time with third party clusters but this time I have to
work with oracle clusters, looks like this is not stable, I am trying
to
find out the root cause.
Problem is dbca fails at 37%
create/cloneDBCreation.log:
ORACLE instance started.
Total System Global Area 252776588 bytes
Fixed Size 450700 bytes Variable Size 218103808 bytes Database Buffers 33554432 bytes Redo Buffers 667648 bytesCreate controlfile reuse set database rac *
alert_rac1.log:
'/opt/oracle/oradata/rac/undotbs01.dbf' , '/opt/oracle/oradata/rac/users01.dbf' , '/opt/oracle/oradata/rac/xdb01.dbf'
List of nodes always 0, irrespective of number of nodes up
rac1_diag_18951.trc:
*** SESSION ID:(2.1) 2003-09-26 13:21:09.847
CMCLI WARNING: CMInitContext: init ctx(0xabba1fc)
kjzcprt:rcv port created
Node id: 0
List of nodes: 0,
*** 2003-09-26 13:21:09.853
Reconfiguration starts [incarn=0]
I'm the master node
*** 2003-09-26 13:21:09.853
Reconfiguration completes [incarn=1]
CMCLI WARNING: ReadCommPort: received error=104 on recv().
kjzmpoll: slos err[12 CMGroupGetList 2 RPC failed status(-1)
respMsg->status(0) 0]
[kjzmpoll1]: Error [category=12] is encountered
CMCLI ERROR: OpenCommPort: connect failed with error 111.
kjzmdreg1: slos err[12 CMGroupExit 2 RPC failed status(1) 0]
[kjzmleave1]: Error [category=12] is encountered
error 32700 detected in background process
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process ORA-32700: error occurred in DIAG Group Service ORA-27300: OS system dependent operation:CMGroupExit failed withstatus: 0
ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: 2 ORA-27303: additional information: RPC failed status(1) ORA-32700: error occurred in DIAG Group Service ORA-27300: OS system dependent operation:CMGroupGetList failed withstatus: 0
ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: 2 ORA-27303: additional information: RPC failed status(-1)respMsg->status(0)
according to the posting
http://lists.suse.com/archive/suse-oracle/2002-Feb/0058.html
all logical volumes should not start at 0 but all my lvms start at 0
end
at 124 according to yast2
it is by any chance some thing to do with my shared harddrive setup.
I really appreciate any of your commnets and ideas.
Here is my 9201 RAC setup
HW: Two Dellpoweredge 1600SC (two Xeon CPUs 2.40GHz),1Gbit Eth inter
connect.
shared disk: Adaptec 29160 Ultra160 SCSI adapters from both nodes
connected to SEAGATE ST373307LW externally - (I dont know any tools
to
make sure this setup is OK - or must I go for certified shared
storage)
SW: SuSE SLSE8 (/etc/SuSE-release: SuSE SLES-8 (i386) VERSION = 8.1)
2.4.19-64GB-SMP #1 SMP )applied k_smp-2.4.19-196.i586.rpm patch
(Oracle certified on this OS)
I have setup raw partitions with lvm and bind to /dev/raw/raw*
$ ls -dl /dev/oracle
drwxrwxrwx 2 root root 4096 2003-09-27 14:33
/dev/oracle
$ ls -dl /dev/oracle/lvol1
brw-rw---- 1 oracle dba 58, 0 2003-09-27 14:33
/dev/oracle/lvol1
... up to 25 vols
$ ls -ld /dev/raw
drwxrwxrwx 2 root root 4096 2003-09-19 17:39 /dev/raw
$ ls -ld /dev/raw/raw1
crw------- 1 oracle dba 162, 1 2003-09-19 17:39
/dev/raw/raw1
... upto 25 volumes bind with
Cluster manager is the one which uses one of the shared disks first. I
started
getting errors when I start dbca, seems to be oracm buggy. Deleted
totally previous install, installed 9201 cluster manager and applied
9203 cluster manager patch. Installed oracle successfully.
started ocmstart.sh
cm.log:
oracm, version[ 9.2.0.2.0.41 ] started {Fri Sep 26 14:51:10 2003 }
KernelModuleName is hangcheck-timer {Fri Sep 26 14:51:10 2003 }
OemNodeConfig(): Network Address of node0: 192.168.1.1 (port 9998) {Fri Sep 26 14:51:10 2003 } OemNodeConfig(): Network Address of node1: 192.168.1.2 (port 9998) {Fri Sep 26 14:51:10 2003 }
HandleUpdate(): NODE(0) IS ACTIVE MEMBER OF CLUSTER, INCARNATION(2) {Fri Sep 26 14:51:13 2003 } HandleUpdate(): NODE(1) IS ACTIVE MEMBER OF CLUSTER, INCARNATION(1) {Fri Sep 26 14:51:13 2003 }
lsmod shows:
hangcheck-timer 1248 0 (unused)
that 0 doesnt seems to be right
even after the 9203 patch watchdogd is still part of the ocmstart.sh
not sure
if I have to comment it.
lsnodes output is wrong (oracm started only on one node)
$ lsnodes -l
nd1
$ lsnodes (this is wrong, ocms not running on the other node)
nd1
nd2
$ lsnodes -n
nd1 0
nd2 1
$ lsnodes -v
CMCLI WARNING: CMInitContext: init ctx(0x804ad00)
nd1
nd2
CMCLI WARNING: CommonContextCleanup: closing comm port
$
Received on Sat Sep 27 2003 - 22:46:52 CDT