RE: Oracle Event Monitoring and VMWare

From: Michael Schmitt <mschmitt_at_uchicago.edu>
Date: Wed, 18 Sep 2013 14:56:28 +0000
Message-ID: <1184E7EFAB1D1C47A5038D06F64BE92611905DE3_at_XM-MBX-02-PROD.ad.uchicago.edu>



We see something similar so I will be interested in what comes of this thread.

I am not sure if I am completely correct on the below explanation, but this is my understanding from our ESX admin.

The problem is that it is a result of over allocating multiple virtual CPUs to a host and over allocating virtual CPUs compared to physical cores. The way VMware works, is that if you allocate 4 virtual CPUs to a VM, the virtual host will have to grab all 4 cores every time it wants to do work. If it cannot find all 4 cores available on the physical machine (and it has to get access to all 4 and not a subset), it will run into some form of wait. The chances of getting multiple cores at the same time is reduced when you have over allocated the number of Virtual CPUs to physical CPUs.

It seems the majority of your problem is a result of not having the support you need. A good ESX admin should be able to work with you. When I saw the Blue Medora Plugin that Kyle mentioned, I thought it would be great for us. Unfortunately, our ESX team just got a different monitor and do not want to test it out atm  

-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Dba DBA Sent: Tuesday, September 17, 2013 2:57 PM To: ORACLE-L
Subject: Oracle Event Monitoring and VMWare

Some of the DBAs on here crossover between being DBAs and SAs. This is a question more for these guys. I am rather disturbed by what I wrote below. Oracle 11.2.0.3
VMWare Esxi

4 CPU Server
7 VMs on the server
3 DBs (use internal disks for storage)
4 Application Server VMs use SAN for storage (low end SAN with low end pipe. plus we ran out of space on it)

Issue:

1. DB performance issue
2. Checked the events in all 3 DBs. A variety of events all related to disk on all DBs. (variety of events, mainly used ASH, but ran some 10046 traces). It was pretty clear.
3. No excessive work is going on.
4. Figured this is some kind of disk issue since the DBs run separately on intenral disks and that the other VMs would not be impacting this.


Answer: Not even close. CPU issue on the Host (no CPU issue on the boxes) 13 Virtual Cores assigned to 7 VMs
4 Cores on the server
Race condition(SA doesn't know this word, but its my interpretation of his explanation) and a number of context switches. Cores have to just grab different threads coming from the VM. My guess is that this reduces the benefit of CPU caching since cache for a specific VM keeps getting pushed out. (best guess. not a hardware guy)

CPU Issue: I may see disk at the VM level, but this is completely irrelavent from what is going on at the host.

Solution: Each VM gets allocated 1 virtual core

Per my SA: Giving extra Virtual Cores to a VM does not impact performance. It does not matter if the application can handle multiple cores better, those cores are just 'threads' at the OS level. So it doesn't matter.

This raises a few disturbing questions...

  1. Does the number of virtual cores a VM is allocated have ANY meaning whatsoever ? I had assumed that this represented a % of the CPU allocated to each VM.
  2. In a virtual environment how do we interpret oracle events? They appear to be meaningless to the host tier. I know that excessive LIOs or locking issues and the such can be meaningful, but in general I don't really have alot of data to provide an SA to help diagnose the problem.
  3. How do I work with an SA if I only have access to the database inside the VM. No unix access in operations. The operations team gives very minimal support to any performance issues. I do not even have direct contact with them. If I see events pointing to 'serial disk reads', that information appears to be meaningless.
  4. Per my SA, VMWare does not automatically store historical performance data and he needs to look at it in real time. I have to open tickets to reach operations and days go by... So the issues can clear up. Operations will use Hyper-V. Are there things that Hyper-V will automatically store that I can ask them to look at to compare? I want to increase my knowledge about this so at a minimum I can communicate better with the operations SAs. So we don't talk through each other.

--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l Received on Wed Sep 18 2013 - 16:56:28 CEST

Original text of this message