Re: rac cluster crashes
Date: Mon, 20 Apr 2009 19:07:17 -0300
There is most likeley an instance eviction that is causing the server to reboot. Take a look at instance eviction troubleshooting in metalink, there are several interesting notes.
Also, the easy ones: network conectivity in interconnet, is it switched or direct? are you certain it's working? how is the load in the rebooted node?
Oracle Certified Professional
On Mon, Apr 20, 2009 at 5:15 PM, ed lewis <eglewis71_at_gmail.com> wrote:
> We have been experiencing crashes on our
> RAC cluster. It's a 2-node cluster, Oracle 10.2.0.4 SE,
> RAC 10.2.0.4, ASM 10.2.0.4, on Solaris servers ,T5220,
> running 5.10. In all cases the server, or servers shut down,
> reboots, and everything starts up fine.
> The crashes occur around every 7-10 days. They never
> occur during prime working hours. It's mostly around midnight,
> or early morning, and once on Sunday night. Most of the time
> 1 server reboots, and the other stays operational. Other times they both
> reboot, usually around an hour apart.
> There was a known problem with NTP (network time protocol), and a patch
> was provided.
> We installed it on our test cluster, but it appears that it didn't help.
> We are working with Oracle support, but so far they have not
> found anything conclusive. They think it may be a network issue.
> I've been looking for a job that may be running at the time
> of the crash. So, far I've come across a system backup being done,
> but have not found anything to go on.
> Has anyone experienced this problem, who can provide some insight ?
> thanks ed