RE: Oracle clusterware related question

From: Hameed, Amir <Amir.Hameed_at_xerox.com>
Date: Tue, 8 May 2012 11:49:52 -0400
Message-ID: <304F58144267C5439E733532ABC9A3A114E736CF_at_USA0300MS02.na.xerox.net>



Thanks Tim,
The cables remained unplugged for 30 minutes. I am using the default values for the "disktimeout" and "miscount" parameters and they are pasted below:

crsctl get css disktimeout
CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.

crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

In my mind, the cluster should have evicted the node after 200 seconds (DTO).
Amir
-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman Sent: Tuesday, May 08, 2012 11:32 AM
To: oracle-l_at_freelists.org
Subject: Re: Oracle clusterware related question

Amir,
Your phrase "/kept showing that the node was still part of the cluster/"

doesn't mention how long that state lasted. Clearly, from your email, it lasted too long, but equally obviously, at some point the clusterware

reacted, and I'm wondering how long that wait might have been?

Armed with that information about how long it took for the clusterware to react in mind, I'd suggest using the "crsctl query css" command as suggested here in the 11.2 docs online...

            /crsctl get css/

    /Use the |crsctl get css| command to obtain the value of a specific     Cluster Synchronization Services parameter./

    //

    /Syntax/

/crsctl get cssparameter
/

    /Usage Notes/

      *

/Cluster Synchronization Services parameters include:/

/clusterguid

        diagwait
        disktimeout
        misscount
        reboottime
        priority
        logfilesize

/
*
/This command only affects the local server/

    /Example/

    /To display the value of the |disktimeout| parameter for CSS, use     the following command:/

    /$ crsctl get css disktimeout
    200 /

So, you may want to share what the values for "disktimeout" and "misscount" were, and whether those values corroborated at all with your

observations?

Hope this helps?

-- 
Tim Gorman
consultant ->  Evergreen Database Technologies, Inc.
postal     =>  PO Box 352151, Westminster CO 80035
website    =>  http://www.EvDBT.com/
email      =>  Tim_at_EvDBT.com
mobile     =>  +1-303-885-4526
fax        =>  +1-303-484-3608
Lost Data? =>  http://www.ora600.be/ for info about DUDE...



On 5/8/2012 8:41 AM, Hameed, Amir wrote:
> Folks,
> I have a three-node Oracle RAC running with Grid version 11.2.0.3. So,
> far there is no database created and only CRS is running on all nodes.
I
> am using NFS for everything (binaries, OCR&  voting disk and database
> files). Each server has two 10GbE NICs for dNFS. The binaries, OCR and
> voting disks are on an aggregated link (two 1GbE NIC). The OS is
Solaris
> 10.
>
>
>
> While doing destructive testing to validate configuration and to
observe
> behavior in extreme scenarios, when we pulled cables on one RAC server
> from both NICs that are part of the aggregated link for the binaries,
> voting disk and OCR, I was expecting that because CRS would not be
able
> to access the voting disks on that node to update its status,
> clusterware would eject that node from the cluster. The "crsctl status
> resource -t" command from the other nodes kept showing that the node
was
> still part of the cluster. I am trying to understand this behavior and
> would appreciate if someone can explain it.
>
>
> Thanks
>
> Amir
>
>
> --
> http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Tue May 08 2012 - 10:49:52 CDT

Original text of this message