rac cluster crashes
Date: Mon, 20 Apr 2009 16:15:21 -0400
We have been experiencing crashes on our RAC cluster. It's a 2-node cluster, Oracle 10.2.0.4 SE, RAC 10.2.0.4, ASM 10.2.0.4, on Solaris servers ,T5220, running 5.10. In all cases the server, or servers shut down, reboots, and everything starts up fine.
The crashes occur around every 7-10 days. They never occur during prime working hours. It's mostly around midnight, or early morning, and once on Sunday night. Most of the time 1 server reboots, and the other stays operational. Other times they both reboot, usually around an hour apart.
There was a known problem with NTP (network time protocol), and a patch was provided. We installed it on our test cluster, but it appears that it didn't help. We are working with Oracle support, but so far they have not found anything conclusive. They think it may be a network issue.
I've been looking for a job that may be running at the time of the crash. So, far I've come across a system backup being done, but have not found anything to go on.
Has anyone experienced this problem, who can provide some insight ?
thanks edReceived on Mon Apr 20 2009 - 15:15:21 CDT