Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Oracle 10g RAC crashes and VMware ESX Server

Oracle 10g RAC crashes and VMware ESX Server

From: Chuck Whealton <chuck_whealton_at_yahoo.com>
Date: 28 Feb 2007 17:56:05 -0800
Message-ID: <1172714165.449212.191460@a75g2000cwd.googlegroups.com>


Everybody:

I've searched the VMTN forums and found no solutions to this one (just another similar question with no resolution) and I found nothing of any use on Oracle's Metalink. I also looked at some of the postings in this newsgroup, but I saw nothing along the lines of what I'm doing and the problems I'm having. This posting is mainly a copy of a posting I made in the VMware Technology Network discussion groups.

I'm running ESX 3.0 on a couple of BL20p G3 blades. The storage is on an EVA5000 disk array (3.025 firmware), using two separate fabrics. The shared storage was created on a vDisk and zeroed, connected to each VM with the proper support for shared storage (i.e. - virtual bus, separate controller, etc.). I have two Windows 2003 Server VMs, an internal VM network for the Oracle RAC interconnect with private IP addressing, as well as a standard public network. I'm on Oracle 10g (10.2.0.3.0).

This problem ONLY happens with my RAC nodes so I'm 99% sure it's cluster related as I have no problems with standalone Oracle databases on VMs.

My W2K3 virtual machines crash at random. I can't find anything useful in the vmware (vmkernel, hostd.log, etc.) logs, however, if I look at the vmware.log for one of the nodes, the crashes start off with a "vcpu-0| CPU reset| soft". That's it, and that's not enough. The Oracle alert logs have nothing of any use beyond the usual remaining node indicating that it's lost connection with a member.

I've had similar problems with Oracle 9i RAC on virtual machines using ESX 2.5.x, though they never resulted in node restarts, just lost communications with instances going down. Once I put that configuration on physical hardware - I had no problems whatsoever. I can only assume that once I put this 10g configuration on physical hardware, these particular problems will vanish also (I hope!).

Does ANYBODY have ANY ideas on what could cause (probable) Oracle clusterware problems on ESX Server? This isn't production, but it bugs the living heck out of me. It can't be pure coincidence that I've had problems on both 9i and 10g RAC on both ESX 2.5.x and 3.0.

Thanks...

Charles R. Whealton
Charles Whealton @ pleasedontspam.com Received on Wed Feb 28 2007 - 19:56:05 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US