Re: Newbie Oracle RAC issue

From: Mark Bobak <Mark.Bobak_at_proquest.com>
Date: Wed, 30 Apr 2014 18:52:14 +0000
Message-ID: <CF86BD5E.58F70%Mark.Bobak_at_ProQuest.com>

No, you misunderstand.

What they did, mounting new storage and copying to new mountpoint is definitely *not* ok!

What I was saying, is that it should be easy enough, if you're running LVM or some other volume management solution, to grow a LUN, and then grow the filesystem live. This is what your sysadmins need to do, if you're in this situation in the future.

What they did pretty much hosed you.

If I was onsite, maybe I could try untangling it, but it may be easier for you to wipe and re-install. And tell the sysadmins I said "What the &*!?&* were you thinking??" :-)

One thing you might try, before wiping and re-installing, would be to shut everything down (Assuming anything is up now, which I'm guessing it's not), remove the old filesystem ( the one that had filled up), and remount the new filesystem with the same name as the old one had. I'm not making any promises, but, it's worth a quick try. If it's still dead, total wipe and reload may be easiest.

-Mark

From: Chris King <ckaj111_at_yahoo.ca<mailto:ckaj111_at_yahoo.ca>> Reply-To: Chris King <ckaj111_at_yahoo.ca<mailto:ckaj111_at_yahoo.ca>> Date: Wednesday, April 30, 2014 at 2:45 PM To: Mark Bobak <Mark.Bobak_at_ProQuest.com<mailto:Mark.Bobak_at_ProQuest.com>>, "oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org>" <oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org>> Subject: Re: Newbie Oracle RAC issue

the alert log for the remaining node in grid home says:

Oracle Database 11g Clusterware Release 11.2.0.3.0 - Production Copyright 1996, 2011 Oracle. All rights reserved. 2014-04-30 13:25:39.673
[client(3130)]CRS-2317:Fatal error: cannot get local GPnP security keys (wallet).
2014-04-30 13:25:39.674
[client(3130)]CRS-2316:Fatal error: cannot initialize GPnP, CLSGPNP_ERR (Generic GPnP error).
2014-04-30 13:25:39.684
[client(3130)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/rac1/client/ocrconfig_3130.log.

I execute the command:
$ ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed. CRS-4000: Command Start failed, or completed with errors.

There were no further lines written to the alert log after this command was issued.

2014-04-30 13:25:35.969: [ OCRCONF][2705233664]ocrconfig starts...
2014-04-30 13:25:36.064: [  OCRMSG][2705233664]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2014-04-30 13:25:36.064: [  OCRMSG][2705233664]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-30 13:25:36.064: [  OCRMSG][2705233664]prom_connect: error while waiting for connection complete [24]
2014-04-30 13:25:36.064: [ OCRCONF][2705233664]Failure initializing OCR in DEFAULT. Trying REBOOT. err :[PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]]

Yes.. a disk was added and mounted, and then all the oracle software was copied to the new mount point. So, okay, I'm glad to know this is okay to do, even with the cluster/database running.

On Wednesday, April 30, 2014 2:30:58 PM, Mark Bobak <Mark.Bobak_at_proquest.com<mailto:Mark.Bobak_at_proquest.com>> wrote: What do you see in $GRID_HOME/log/`hostname -s`/alert`hostname -s`.log ?

What happens if you do 'crsctl start crs'? What other info do you see in that log file after attempting that command?

When you say "a disk was added and files copied", are you saying they added a disk, mounted a new f/s, and copied stuff over to new mount point? It should be relatively straightforward to grow a filesystem live. I know our admins do it all the time.

-Mark

From: Chris King <ckaj111_at_yahoo.ca<mailto:ckaj111_at_yahoo.ca>> Reply-To: Chris King <ckaj111_at_yahoo.ca<mailto:ckaj111_at_yahoo.ca>> Date: Wednesday, April 30, 2014 at 2:21 PM To: "oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org>" <oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org>> Subject: Newbie Oracle RAC issue

Had a successful first install of Oracle RAC 11gR2 on RHEL6 in the lab... but we were running out of disk on the root drive, where Oracle software is installed. In my absence, disk was added, and files copied while the cluster/database was running. Subsequently one node crashed and is not recoverable. The remaining node keeps throwing this error when I attempt to start the clusterware:

$ crsctl start cluster
CRS-4639: Could not contact Oracle High Availability Services CRS-4000: Command Start failed, or completed with errors.

I'm unable to start the clusterware. I looked at the log file, and saw references to failures reaching the crashed node, so I thought maybe I have to tell the clusterware that we're missing a node, but all the commands I've found to do so require cluster services to be running.

What else should I be looking at to diagnose this? I'm trying to evaluate if I have to reiinstall everything from scratch or if this lab setup can be salvaged. Thanks!

also, please note the following is the only cluster-related process I find running on the remaining node: root 1557 1 0 11:50 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Apr 30 2014 - 20:52:14 CEST

This message: [ Message body ]
Next message: Chris King: "Re: Newbie Oracle RAC issue"
Previous message: Nigel Thomas: "Re: problem with a trigger (Oracle 11gR2)"
In reply to: Chris King: "Re: Newbie Oracle RAC issue"
Next in thread: Chris King: "Re: Newbie Oracle RAC issue"
Reply: Chris King: "Re: Newbie Oracle RAC issue"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message