Re: Anyone tried kill ASM in 11gR2 RAC?

From: LS Cheng <exriscer_at_gmail.com>
Date: Sat, 23 Jan 2010 09:43:02 +0100
Message-ID: <6e9345581001230043x78f03b44n4887ad226545991_at_mail.gmail.com>



Hi

I did further tests, previously the results I posted was a test on AIX 6.1 and 11gR2, I have tested now with Linux x86-64

  1. kill asm pmon
  2. chown root.root asmdisk

crsd process dies because it cannot access OCR (Disk Group not mounted):

2010-01-23 09:38:59.944: [ OCRASM][801350224]proprasmo: The ASM disk group OCR is not found or not mounted
2010-01-23 09:38:59.944: [ OCRRAW][801350224]proprioo: Failed to open [+OCR]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2010-01-23 09:38:59.944: [ OCRRAW][801350224]proprioo: No OCR/OLR devices are usable
2010-01-23 09:38:59.944: [ OCRASM][801350224]proprasmcl: asmhandle is NULL 2010-01-23 09:38:59.944: [ OCRRAW][801350224]proprinit: Could not open raw device

2010-01-23 09:38:59.944: [  OCRASM][801350224]proprasmcl: asmhandle is NULL
2010-01-23 09:38:59.945: [  OCRAPI][801350224]a_init:16!: Backend init
unsuccessful : [26]
2010-01-23 09:38:59.945: [  CRSOCR][801350224] OCR context init failure.
Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCR.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +OCR.255.4294967295
ORA-15001: diskgroup "OCR"

] [8]
2010-01-23 09:38:59.945: [ CRSD][801350224][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-01-23 09:38:59.945: [ CRSD][801350224] Done.

crsctl gives error:

[root_at_grid1 ~]# /u01/grid/11.2.0/bin/crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors.

however ASM and cssd is up and running

So we have a complete different scenario, same test two different results in two operating system.

Thanks!

On Thu, Jan 21, 2010 at 2:36 PM, Bobak, Mark <Mark.Bobak_at_proquest.com>wrote:

> Yep, makes sense, I think.
>
>
>
> Clusterware starts, ASM serves up OCR and voting disk geometry, as it
> relates to raw devices that make up your OCRDATA diskgroup. Clusterware
> caches that info, no longer needs to talk to ASM for it.
>
>
>
> You do the damage, including changing ownership of devices that make up
> OCRDATA diskgroup to root:root. But, clusterware processes run as root, so,
> they can still read/write those raw devices.
>
>
>
> What happens if you chown the devices to root:root, then also chmod 000 all
> those devices?
>
>
>
> -Mark
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *LS Cheng
> *Sent:* Thursday, January 21, 2010 7:44 AM
> *To:* K Gopalakrishnan
> *Cc:* Oracle Mailinglist
> *Subject:* Re: Anyone tried kill ASM in 11gR2 RAC?
>
>
>
> Hi
>
> So even OCRDATA Disk Group is not mounted and the physical disks has
> root.root instead of grid.oinstall ownership Clusterware will be up and
> running? So basically you mean Clusterware does not need ASM to be up to
> access the OCRDATA disks?
>
> My test was
>
> - kill ASM
> - change asm disks (OCRDATA) from grid.oinstall to root.root
> - check clusterware status which was up and running
>
>
>
>
>
> Thanks
>
> On Thu, Jan 21, 2010 at 1:38 PM, K Gopalakrishnan <kaygopal_at_gmail.com>
> wrote:
>
> Clusterware failure will happen _only_ when it can not acess the
> physical devices (disk timeout in css) and shutting down ASM does not
> revoke the access to disks. In your case clusterware _knows_ the
> location of ocr/voting information in ASM disks and it can continue
> reading/writing even ASM instance is down.
>
> -Gopal
>
>
>
>
>
> On Thu, Jan 21, 2010 at 2:51 AM, LS Cheng <exriscer_at_gmail.com> wrote:
> > Hi
> >
> > I was doing some cluster destructive tests on RAC 11gR2 a few days ago.
> >
> > One of tests was kill ASM and see how does that affects Clusterware
> > operation since OCR and Voting Disks are located in ASM (OCRDATA Disk
> > Group). After killing ASM nothing happened as it was quicky started up
> > again. So far so good. The next test was same test but changing the ASM
> > Disks ownership so when ASM is restarted OCR Disk Group cannot be
> accessed.
> > Surprisingly ASM Was started up, Database Disk Group was mounted OCR disk
> > Group obviously did not get mounted but then the Cluster was working
> without
> > any problems.
> >
> > So how is this happening? Doesnt Clusterware need to write and read to
> > Voting Disk every second? I was expecting a Clusterware failure in the
> node
> > but everything worked just as everything were ok.
> >
> > Thanks!
> >
> > --
> > LSC
> >
> >
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Jan 23 2010 - 02:43:02 CST

Original text of this message