Oracle Grid Infrastructure - Reboot less Node Fencing

mikerajendran's picture
articles: 

Comments

Hi, Mike. I've been trying to find more information about the implementation of i/o fencing without a reboot that you describe. So far, all I can find is techniques for doing it if you have IPMI hardware installed and configured. Are you saying that this is implemented without IPMI? Can you point me to any docs?
Thank you for bringing this topic up.
John.

mikerajendran's picture

Hello John - My apology for the late response. Oracle Reboot less node fencing in 11g R2 Grid Infrastructure is for a member kill from the cluster without a node termination. The cluster reforms taking out the victim node which has lost the heart beat(or disk timeout accessing voting disks). The kill block code is executed by the offending node's to take out the node from the cluster without rebooting it. In all these reboot less fencing, the member kill is co-ordinated by Clusterware(local or remote node) and Operating System on the victim node. But in certain cases when member kill escalation to node-termination may need to be executed without waiting for (or in the absense of) Clusterware and Operating System. In such cases the node needs to be terminated by IPMI which is capable of power recycling the server with remote commands.

It begs a seperate article to explain the IPMI(Intelligent Platform Management Interface). IPMI is used to manage the system remotely in the absense of OS so Clusterware can use that to reboot a node for I/O fencing. The power should be on and the host should be on the network(IPMI needs a seperate IP and the best network would be the management network preferably using DHCP) and the server should be having a Baseboard Management Controller(BMC) with the firmware compatible to IPMI 1.5 in order to configure IPMI in 11g R2. It also needs a username and password which will be used duirng a node eviction operation. The larger cluster CSSD(evicting node) needs to communicate to the sub cluster Baseboard Management Controller(to be evicted node or victim node) over LAN using the username / password to reboot the node.

Michael Rajendran
http://www.unbreakablecloud.com