Home » Server Options » RAC & Failsafe » Which node will be evicted? (Oracle 10.2.0.4.0 , AIX 5.3 64bit)
Which node will be evicted? [message #546322] Tue, 06 March 2012 02:58 Go to next message
snowball
Messages: 229
Registered: April 2006
Location: China
Senior Member

Hi, folks
We have two node RAC(10.2.0.4) on AIX 5.3 with HACMP.
Node 1 is the master node at first. During a period of high memory using, avaliable memory only left around 100 MB.And run queue went to a very high number.
After that, the OS can not login and instance on node 1 crash. But we found after 20 mins, the OS reboot.

From ocssd.log , we found there is only the crs startup infomation(after os reboot) on node 1 while there is eviction info started from node 2, and then node 2 is only the node in the cluster.

Here is the timeline:
* 19:30 ~ 19:40 , 2 nodes went to high usage memory, up to 100% usage and both nodes get high run queue, but node 1 more higher.
node 1 was currently the master node.

* 19:45 ~ 20:00
19:45 ==> From the ocssd.log of node 2, node 2 start to evict node 1 while there is only crs start info after node 1's os reboot. Node 2 became the master node.
20:00 ==> The interconnect on node 1 is not working. AIX is doing the system dump.

* 20:00 ~ 20:20 , node 1's os rebooting,
20:20 ==> Node 1 join the cluster and node 1 goes to normal.


I have several question in this scenario:
1. When node 1 is a master node, why there is not any logs in ocssd.log that shows node 1 do the eviction to node 2 or even try to do?
2. In this high load situation, which is the key info to decide who evict whom?
3. Will this eviction(cause os reboot) has delay? node 2 start to evict the node 2 at 19:45, but node 1 reboot at 20:00, there is around 15mins.

Thanks very much.
Re: Which node will be evicted? [message #546348 is a reply to message #546322] Tue, 06 March 2012 04:39 Go to previous messageGo to next message
John Watson
Messages: 4693
Registered: January 2010
Location: Global Village
Senior Member
You may need to set the DIAGWAIT css parameter to get better logging of eviction events. But if you are using HACMP, the evictions may be managed by HACMP, not by Oracle.
In general, if you are running your nodes at 100% CPU usage, you will get evictions. This isn't Oracle's fault: it is yours, for over-loading the system to such a degree that an eviction is necessary to preserve integrity. Memory usage is irrelevant because AIX will always use all available memory, this is Good Thing and nothing to worry about. Saturated CPU and interconnect traffic are what matter.
Re: Which node will be evicted? [message #546353 is a reply to message #546348] Tue, 06 March 2012 04:48 Go to previous messageGo to next message
snowball
Messages: 229
Registered: April 2006
Location: China
Senior Member

Hi, John
Thanks for kindly support.
The clusterware is using Oracle Clusterware, not HACMP. Is the eviction still managed by HACMP?
It seems that, Oracle suggest to set the DIAGWAIT to 13. Is this enough for logging eviction event?

Thanks very much.
Re: Which node will be evicted? [message #546475 is a reply to message #546353] Tue, 06 March 2012 23:58 Go to previous messageGo to next message
snowball
Messages: 229
Registered: April 2006
Location: China
Senior Member

I got the answer. From ocssd.log. HACMP is using for clusterware.
Thanks again.

Re: Which node will be evicted? [message #546649 is a reply to message #546475] Wed, 07 March 2012 23:42 Go to previous message
snowball
Messages: 229
Registered: April 2006
Location: China
Senior Member

I am still confuse about, what's the function of two clusterwares(Oracle clusterware and vendor clusterware e.g. HACMP) when both of them appears in RAC?
If HACMP is using, then what will Oracle clusterware do?
If Oracle clusterware is using, will HACMP do with RAC?

Unfortunately, I didn't find any document to explain those relationship and function.
Previous Topic: multiple Oracle home machine on two node RAC ENV
Next Topic: OCR Data Integrity failed On second node
Goto Forum:
  


Current Time: Fri Oct 24 22:18:01 CDT 2014

Total time taken to generate the page: 0.12507 seconds