Re: Question about resource's start dependency in Clusterware

From: Jure Bratina <jure.bratina_at_gmail.com>
Date: Thu, 30 Jan 2014 11:13:10 +0100
Message-ID: <CAC08BHJMLoT6UJmXXpUg=FmKcKbCmCsOqoqrMNKdpSnygUL=qA_at_mail.gmail.com>



Hi,

> *Regardless of the value of the **AUTO_START** resource attribute for a
resource, the resource can start if another resource has a hard or weak start dependency on it or if the resource has a pullup start dependency on another resource.*

Thanks, I think that answers my question. Though the note in the documentation is actually under the section dealing with startup of Clusterware, so maybe the dependencies in the initial startup sequence are treated differently than when Clusterware is already operational.

> Which seems to indicate that indeed a hard dependency is enough to start
the other resource.
> But in the same document, Oracle states also:
*> Oracle recommends that resources with **hard** start dependencies also have **pullup** start dependencies.*
> I'm not sure why that is.....

I think the reason for pullup dependencies can be found in the out-of-order startup sequence described here:
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/crschp.htm#CWADD92086 : "When two or more resources depend on each other, a failure of one of them may end up causing the other to fail, as well. In most cases, it is difficult to control or even predict the order in which these failures are detected. For example, even if resource A depends on resource B, Oracle Clusterware may detect the failure of resource B after the failure of resource A.

This lack of failure order predictability can cause Oracle Clusterware to attempt to restart dependent resources in parallel, which, ultimately, leads to the failure to restart some resources, because the resources upon which they depend are being restarted out of order." And this sentence explains (in my opinion) why pullup start dependencies are needed: "If the attempt to restart resource A fails, then as soon as resource B successfully restarts, Oracle Clusterware reattempts to restart resource A."
So if we take the explanation above and the example from my first post:

[oracle_at_london1 ~]$ crsctl status resource ora.prod.db -p NAME=ora.prod.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r-- [...]
SPFILE=+DATA/prod/spfileprod.ora
START_DEPENDENCIES=hard(ora.DATA.dg) [...] pullup(ora.DATA.dg) STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg)

My understanding is as follows:
The ora.prod.db's hard start dependency on ora.DATA.dg means that upon starting the ora.prod.db resource, the resource ora.DATA.dg should be already running and if it's not, it should be automatically started (even without the pullpup dependency). On the other hand the pullup start dependency means that when the ora.DATA.dg resource is started, it should also start the ora.prod.db resource if its TARGET is not OFFLINE (since we don't have the "always" modifier).

Now, if a failure occurs and Clusterware tries to start those two resources out of order as is stated in the documentation above, the pullup dependency is the mechanism to automatically handle this problem, e.g. suppose the ora.DATA.dg resource fails because the ASM instance crashes. Because of the hard stop dependency on ora.asm (which now isn't in either the online or intermediate state) and ultimately because the database can't run without ASM (the assumption is of course that database files are in ASM), the ora.prod.db resource also fails. Now, if Clusterware tries for whatever reason to start the ora.prod.db resource before ora.DATA.dg, the start of ora.prod.db fails since the ora.DATA.dg can't be started yet. However, when the ASM instance starts and the ora.DATA.dg is brought online (by the ASM instance dependency mechanism), the pullup(ora.DATA.dg) dependency will actually reattempt to start the ora.prod.db resource which will now start successfully (although I'm not sure what happens without the "always" modifier in this case). So in this case if the pullup dependency didn't exist, the second attempt to start the ora.prod.db resource wouldn't happen and it would remain offline.

Maybe the example I made wasn't the most appropriate, since stopping ASM in 11.2 has other implications if OCR is stored in it ( http://docs.oracle.com/cd/E11882_01/rac.112/e41960/srvctladmin.htm#RACAD5043: "You cannot use this command when OCR is stored in Oracle ASM because it will not stop Oracle ASM. To stop Oracle ASM you must shut down Oracle Clusterware."), but anyway a similar scenario would probably apply if we have two other dependent resources where neither of them depends on ASM.

Regards,
Jure

On Wed, Jan 29, 2014 at 9:33 PM, D'Hooge Freek <Freek.DHooge_at_exitas.be>wrote:

>
> Hi,
>
> In the documentation I found following note
>
> *Regardless of the value of the **AUTO_START** resource attribute for a
> resource, the resource can start if another resource has a hard or weak
> start dependency on it or if the resource has a pullup start dependency on
> another resource.*
>
> Which seems to indicate that indeed a hard dependency is enough to start
> the other resource.
> But in the same document, Oracle states also:
>
> *Oracle recommends that resources with **hard** start dependencies also
> have **pullup** start dependencies.*
>
> I'm not sure why that is.....
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jan 30 2014 - 11:13:10 CET

Original text of this message