Re: Database upgrade/move I/O waits performance

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Tue, 23 Jul 2019 19:40:40 -0700
Message-ID: <e58f0b91-119c-64d7-3f17-6cac7d54c8bb_at_gmail.com>



Lyall,

One thing from your description that jumps out at me is...

     >> Old: /physical hardware, 512g RAM, 32 CPU/
     >> New: /VMWare VM, 224g RAM, 24 vCPU/


Do you know if the new VM has 100% vCPU reservations and 100% vRAM reservations?  Also, do you know if hyper-threading (default: 2) is still enabled in the ESXi hosts as well as whether or not "HT Sharing" is still set to "Any" in the VM?

Without vCPU and vRAM reservations, other VMs can steal resources from this VM, making performance difficult to predict or manage, to put it mildly.

Even if reservations are set, hyperthreading physical cores makes a vCPU equal to half of a core, so even if the retired HW was substantially slower, the net effect of stolen resources and/or hyperthreading might mean far less processing power from the newer HW, beyond what the smaller numbers imply.

Many organizations bought into VMs for the economic benefit of creating dozens (or hundreds) of virtual machines from a set of physical machines, over-subscribing to resources like CPU, RAM, and network.  And many of the "standards" that organizations have developed for VMs are intended to ensure that each VM "plays nice" in such an over-subscribed environment.  Hyperthreading and resource stealing/sharing is a part of making this economic model work.

vCPU/vRAM reservations are not part of "playing nice" in such an environment;  rather they are methods to prevent over-subscription. As a result, many organizations blindly follow standards established to enable over-subscription.

What many organizations fail to do is create another standard for VMs supporting mission-critical workloads which must have predictable and manageable performance, and certainly "/the company's CRM OLTP Oracle database/" falls into that category.

If you're dealing with VM management according to the economic model, then you have an additional invisible layer of performance problems.  Even if the VMware admin states "I never over-subscribe", without reservations they cannot prove it and they cannot prevent it.

Hope this helps...

-Tim

On 7/23/19 09:39, Lyall Barbour wrote:
> Hello gurus,
>   We've upgraded and moved the company's CRM OLTP Oracle database.
> Old: physical hardware, 512g RAM, 32 CPU -- Oracle 11.2.0.4
> Enterprise, 16g MEMORY_TARGET, SPM baselines used, one or two Profiles.
> New: VMWare VM, 224g RAM, 24 vCPU -- Oracle 12.1.0.2 Enterprise, 72g
> SGA_TARGET, HugePages, 12g PGA_TARGET, no SPM baselines, multiple
> Profiles.
> Having said all that, there's a consistency to the slowness that's
> very interesting.  We are constantly having I/O waits on all (?)
> queries, if not all, the main top sqls that are running in
> v$session_longops.
> Comparing one query.  Explain Plan, Cost, disk reads, buffer gets, all
> line up pretty well in 11g and 12c.  The 2 things that are different
> is the actual time, slowest in 11g is 2.5 seconds, 12c is 14 seconds. 
> and percentage of work, 11g disk work is 85% and cpu work is 15%.  12c
> disk work is 98%, cpu 2%.
> I'm not trying to get any solutions from everyone here, i'm trying to
> get direction on where to look next so i know who to talk to at my
> company with this issue.  Do i talk to the SAN folks and see what
> their graphs are like?  VM team?  keep digging into AWR stats and
> comparisons on the database?
> Comparing AWR hour periods from today (oracle 12.1) to last week,
> tuesday, (oracle 11.2) the amount of buffer gets and disk reads all
> line up with the issue i described above. 420T of buffer cache
> reads last week, 10T today, in that hour period.  ALL work is waiting
> on disk.
> Any help... helps.
> Thanks,
> Lyall Barbour
> -- http://www.freelists.org/webpage/oracle-l .

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jul 24 2019 - 04:40:40 CEST

Original text of this message