RE: Solaris and Veritas clustering training

From: <Christopher.Taylor2_at_Parallon.com>
Date: Tue, 23 Jul 2013 14:47:22 -0500
Message-ID: <F05D8DF1FB25F44085DB74CB916678E887A4566225_at_NADCWPMSGCMS10.hca.corpad.net>



I'm surprised you have to freeze the group before taking it offline.

For example, Windows Clustering (which is actually pretty good) behaves similarly: a.) If you shutdown a resource using sqlplus for example that IS clustered, then boom, you've just failed over your group (assuming the cluster doesn't restart it on the same node - some of this depends on the number of restart attempts you have defined.)

b.) But, if you go into the cluster manager (think hagrp/hares) you take a group offline, move it or whatever. If you take it offline within the cluster, it doesn't try to fail it over.

I'd be surprised that Veritas forces you to freeze it if you're doing all the operations within the cluster api - but perhaps they do and if so they could take a lesson from Microsoft there believe it or not.

Chris

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Norman Dunbar Sent: Tuesday, July 23, 2013 2:28 PM
To: oracle-l_at_freelists.org
Subject: Re: Solaris and Veritas clustering training

Evening all,

On 23/07/13 19:30, Dennis Williams wrote:
> Malcolm,
> The big one is to tattoo on the back of your hands: "Freeze Before
> Shutdown".

Yes, yes, yes, yes! And have it tattooed on the back of your eye lids as well, for good measure!

I recently had to learn Veritas Clustering commands very quickly indeed, as the project I was working on decided to use Veritas Clustering on Linux. I managed to shut down a database before knowing that I had to freeze the cluster. The brown stuff had an unscheduled interaction with the air conditioning!

Some of the Veritas commands are weird. They don't work properly (ie, they hang) if piped through grep, for example. I have no idea which one(s) were affected, sorry.

As a quick overview, and someone will correct me if I'm wrong, I hope, and bear in mind that I had to learn this on the fly:

There are resource groups and resources. The hagrp command acts on groups while hares acts on resources. Groups are made up of resources - so a group called Flintstone can be made up of the resources Barney, Fred, Wilma, Betty and a database and listener to go with it. If you shut down any of these, without freezing the group, then the whole group will go into a failure (or partial) state on the current server, and attempt to fail over to the next one it is configured to run on.

Such fun when you stop a database and run a command which works fine, but the next time you run it, or another one, it fails. The while plot has gone somewhere else!

The following might be useful:

hagrp -list = list all the groups.
hares -list = list all the resources.

hagrp -freeze -group group_name -sys server_name = freeze the group. hagrp -unfreeze -group group_name -sys server_name = unfreeze the group.

hagrp -offline -group group_name = take group offline. hares -offline -resource resource_name = take resource offline. (not too sure I remember the syntax here, beware!)

hagrp -online -group group_name = put group online. hares -online -resource resource_name = put resource online. (not too sure I remember the syntax here, beware!)

hagrp -clear -group group_name -sys server_name = clear failed status for given group. I think there's an hares to do the same for a resource, but I'm can't quite remember!

hagrp -state -group group_name = show state of the group's resources = online, offline, partial or failed. Hares ditto?

hagrp -display = display detailed system info.

hagrp -switch -group group_name -sys server_name = fail the given group from the current server to the supplied server. Can be a PITA if it fails on the supplied one, fails over automagically to another and then, eventually, fails back to the one it's on now. Ask me how I know!!!

There's also a hastatus command:

hastatus -summary = shows the summary of all the groups

The logs are about as much use as a chocolate teapot when something goes wrong. They can be found, on Linux anyway, in /var/VRTSVCS/*.log and in the event of a failure, usually has something meaningful like:

Service X failed on system Y.
Service X running on system Z.

No help at all!

HTH Cheers,
Norm.

--
Norman Dunbar
Dunbar IT Consultants Ltd

Registered address:
Thorpe House
27a Lidget Hill
Pudsey
West Yorkshire
United Kingdom
LS28 7LG

Company Number: 05132767
--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 23 2013 - 21:47:22 CEST

Original text of this message