Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Oracle 10g RAC crashes and VMware ESX Server

Re: Oracle 10g RAC crashes and VMware ESX Server

From: Chuck Whealton <chuck_whealton_at_yahoo.com>
Date: 4 Mar 2007 12:42:12 -0800
Message-ID: <1173040931.910272.274730@i80g2000cwc.googlegroups.com>


On Mar 1, 5:26 pm, DA Morgan <damor..._at_psoug.org> wrote:
> Chuck Whealton wrote:
> > Everybody:
>
> > I've searched the VMTN forums and found no solutions to this one (just
> > another similar question with no resolution) and I found nothing of
> > any use on Oracle's Metalink. I also looked at some of the postings
> > in this newsgroup, but I saw nothing along the lines of what I'm doing
> > and the problems I'm having. This posting is mainly a copy of a
> > posting I made in the VMware Technology Network discussion groups.
>
> > I'm running ESX 3.0 on a couple of BL20p G3 blades. The storage is on
> > an EVA5000 disk array (3.025 firmware), using two separate fabrics.
> > The shared storage was created on a vDisk and zeroed, connected to
> > each VM with the proper support for shared storage (i.e. - virtual
> > bus, separate controller, etc.). I have two Windows 2003 Server VMs,
> > an internal VM network for the Oracle RAC interconnect with private IP
> > addressing, as well as a standard public network. I'm on Oracle 10g
> > (10.2.0.3.0).
>
> > This problem ONLY happens with my RAC nodes so I'm 99% sure it's
> > cluster related as I have no problems with standalone Oracle databases
> > on VMs.
>
> > My W2K3 virtual machines crash at random. I can't find anything useful
> > in the vmware (vmkernel, hostd.log, etc.) logs, however, if I look at
> > the vmware.log for one of the nodes, the crashes start off with a
> > "vcpu-0| CPU reset| soft". That's it, and that's not enough. The
> > Oracle alert logs have nothing of any use beyond the usual remaining
> > node indicating that it's lost connection with a member.
>
> > I've had similar problems with Oracle 9i RAC on virtual machines using
> > ESX 2.5.x, though they never resulted in node restarts, just lost
> > communications with instances going down. Once I put that
> > configuration on physical hardware - I had no problems whatsoever. I
> > can only assume that once I put this 10g configuration on physical
> > hardware, these particular problems will vanish also (I hope!).
>
> > Does ANYBODY have ANY ideas on what could cause (probable) Oracle
> > clusterware problems on ESX Server? This isn't production, but it bugs
> > the living heck out of me. It can't be pure coincidence that I've had
> > problems on both 9i and 10g RAC on both ESX 2.5.x and 3.0.
>
> > Thanks...
>
> > Charles R. Whealton
> > Charles Whealton @ pleasedontspam.com
>
> What occurs to me is that your configuration is not one supported
> by Oracle and, unless this is for playing at home, not likely to
> have a happy ending.
>
> That said ... have you run the 10g cluster verifications? What do
> they tell you?
> --
> Daniel A. Morgan
> University of Washington
> damor..._at_x.washington.edu
> (replace x with u to respond)
> Puget Sound Oracle Users Groupwww.psoug.org- Hide quoted text -
>

Dan:

I made it in today to deal with other problems, so while here, I ran cluvfy for the several stages. Rather than post the entire thing here, I'll post what FAILED (note that I changed any system names, etc., for obvious reasons).

These two are from the database preinstallation stage test.



Check: Total memory
  Node Name     Available                 Required
Comment

OK, I configured my virtual machines with 512MB of memory each. Not sure what happened to the .55 MB and I wouldn't suspect that's the problem, but then again, I'm sure stranger things have happened.



Check: Free disk space in "C:\TMP\1" dir
  Node Name     Available                 Required
Comment

OK, for that one, I have no idea WHY it failed. The environment variable is properly defined for everybody and the directories DO exist on both nodes with ample space. Why it says "unknown" is beyond me. Note that when I ran this again, it came back as passing.

Both of those failures were for the database. There were no failures for CRS integrity in this test.

If I run the pre crsinst stage, I only get the memory failure fromabove (511.45MB versus 512MB). For whatever reason, I don't get the same TMP space failure.

If I do a post check for crsinst I get no failures at all.

Does anything stand out to you or are there specific cluster verification tests you feel may be beneficial to know the output of?

Thanks very much for taking the time.

Charles R. Whealton
Charles Whealton @ pleasedontspam.com Received on Sun Mar 04 2007 - 14:42:12 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US