Re: oracle clusterware: stonith

From: Jeremy Paul Schneider <jeremy.schneider_at_ardentperf.com>
Date: Tue, 26 Jun 2007 10:44:13 -0500
Message-ID: <18be0f260706260844m33e45497m4e91a5ab22a8e272@mail.gmail.com>

Depends on your definition of "stonith" I guess. :) Oracle does seem to like redefining terms (like, say, "grid" computing...) Seems like the linux-ha project was using the term before Oracle though and they usually refer to hardware (power) based solutions. And most other people seem to mean the same thing. This technique is not used by RAC directly (although Oracle can integrate with vendor-based clusterware that might support it). Oracle clusterware itself definitely doesn't do that though.

Kevin's point was that a local userland-based reset is not failsafe. (He's not talking about hangcheck-timer but rather CRS-based reboots - by the way it's CRITICAL that you run hangcheck-timer on linux/RAC deployments.) CRS doesn't reboot right away and doesn't stop requests that are queued in the scsi drivers from hitting disk.

However he kinda skips past hangcheck-timer. To be fair it's really a race condition... there are two userland processes, the process updating the hangcheck clock and the CRS process. If neither can get processor time then the machine reboots. If both can get processor time then the machine reboots. The main problem would be if hangcheck gets processor time but CRS does not.

Also, that situation by itself is not yet a split-brain. On RAC you won't get corruption until the first node has performance instance recovery. Up until that point the cluster just hangs until it has figured out the cluster status. So it's another race condition - the critical question is not how long it takes Oracle to reboot the node but rather what's the relationship between rebooting node 1 and node 2 starting recovery. It would be a problem if the alive node starts recovery, but the "dead" node isn't completely "dead" yet. Power-based solutions are simple and guranteed. Software-based solutions are way more complicated. I tend to think that simple is better.

As Barb pointed out in the presentation the default timeout for CRS is 200s - or 600s if you're using vendor clusterware. If anything then this also reinforces Kevin's paper awhile ago about clusterware... saying that vendor-based clusterware is FAR from being out of the game. I think that this is one clear advantage of using something like serviceguard or hacmp - more mature clusterware.

Also, as a disclaimer, I've worked a fair amount with RAC but I don't consider myself an "expert"... there are lurkers on oracle-l that know a lot more than I do. And I don't know the exact mechanics of every failure situation in RAC.

Just my two cents...

-Jeremy

On 6/25/07, Pedro Espinoza <raindoctor_at_gmail.com> wrote:
>
> Does oracle clusterware remotely power reset? If I recall correctly
> from what I have read from Kevin Closson's blog, it is not so.
> However, this presentation by an oracle insider claims that they
> support stonith.
>
> On slide 11:
>
> IO fencing via stonith algorithm (remote power reset).
>
>
>
> http://oukc.oracle.com/static05/opn/oracle9i_database/40168/053107_40168_source/index.htm
>
>
>
> Thanks, Pedro.
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

-- 
Jeremy Schneider
Chicago, IL
http://www.ardentperf.com/category/technical

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Jun 26 2007 - 10:44:13 CDT