Re: LIO/sec per CPU limit? Is it Hardware or Oracle code?

From: Karl Arao <karlarao_at_gmail.com>
Date: Thu, 10 Aug 2017 19:16:02 -0400
Message-ID: <CACNsJndhfybzsnjLcPPh6_E1BcCX3+vmwhBdvN=sAr2M=01DCw_at_mail.gmail.com>



Henry,

The difference 2299.908 vs. 2300.032 to me seems like they are different make and model CPUs.
From the image below - underlying data from spec.org, for example if you search for any “3.4 Ghz” Xeon CPU it will show you different CPU models (E3,E5,E7) and these CPUs have different speed/core at “3.4 Ghz”.

On the red boxed dots on Q3 2016 and Q4 2016. The machine Q3 2016 PowerEdge FC630 (Intel Xeon E5-2640v4, 3.40 GHz) has 44.4 speed/core vs the Q4 2016 Lenovo ThinkServer RS160 (3.40 GHz, Intel Xeon E3-1230v5) at 64 speed/core both at “3.4 Ghz”. This means 1 CPU of E5-2640v4 is equivalent to .69 CPU of E3-1230v5 (44.4/64 = .69) or moving from E5-2640v4 to E3-1230v5 will give you 44.1% speedup ((64-44.4)/44.4 = 44.1%).

https://public.tableau.com/profile/karlarao#!/vizhome/shared/JTJSGQBSC , http://i.imgur.com/4kEKtv4.png

The key here is normalize the speed by core to get which one is faster. Do not base the speed by Ghz because higher Ghz is not really faster. Example below shows the CPU Xeon E5-2643 across versions. The E5-2643v2 is at 3.5 Ghz vs the newer E5-2643v4 at 3.4 Ghz. Even at lower Ghz the E5-2643v4 is still faster at 61 speed/core (vs 53 speed/core).

https://public.tableau.com/profile/karlarao#!/vizhome/shared/5QRGYC3FN , http://i.imgur.com/7jHfrxl.png

I have a tool I use to do speed test comparison and doesn’t need any db object creation, the readme explains the details https://github.com/karlarao/cputoolkit For a quick test you can just execute the following:

— saturate 4 CPUs on dw database, will run for 1hour but you can kill anytime
./saturate
./saturate 4 dw

— query load profile from v$sysmetric, run periodically then appends to txt file
loadprof.sql

From my experience LIO/sec is a hardware limit, but can be limited/affected by the following OS/database features as well:

OS

cpu binding

Database

12c threaded execution
cgroups/12c processor_group_name
instance caging

-Karl

On Thu, Aug 10, 2017 at 12:46 PM, Henry Poras <henry.poras_at_gmail.com> wrote:

> Thanks for all of the suggestions. Here is where I am so far:
>
> Kevin - SLOB was always on my list of things I wanted to try and for some
> reason never got around to it (I don't mean for this problem, I mean going
> back a bunch of years). My question here relates to the fact that I can't
> take these machines off-line to run a test. Doesn't SLOB hammer the
> resources enough that I really need to run it as a test machine? not while
> our system is up and running (poorly)? Going over some of your docs to see.
>
> Tony - I'll ask sysadms to check, but it's tough without knowing what to
> ask them to look for.
>
> Karl - Sort of like what I looked at in /proc/cpuinfo, but much easier to
> read. After looking again, the two systems look identical from this level.
> Well, almost. cpu MHz is ~0.5% different (2299.908 vs. 2300.032). Doesn't
> seem like enough of a difference to explain my observations.
>
> Mark - both have same hugepage configuration. Same HugePages_Free, Rsvd,
> Total, and size.
>
> Bhavani - I can't run AWR, but I ran snapper on the same query in order to
> compare resource, latches, and statistics
>
> Hans - Looking for differences, but not sure where to look.
>
> MWF - do you know if there is a way to do this without being root?
>
> Stefan - Thanks for the links. Haven't read these in a while. I'll see
> what I can use.
>
> I'll post more if/when I have it.
>
> Henry
>
>
>
> On Thu, Aug 10, 2017 at 11:13 AM, Reen, Elizabeth <elizabeth.reen_at_citi.com
> > wrote:
>
>> Are the disks set up identically?
>>
>>
>>
>> Liz
>>
>>
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freeli
>> sts.org] *On Behalf Of *Henry Poras
>> *Sent:* Wednesday, August 09, 2017 5:46 PM
>> *To:* ORACLE-L
>> *Subject:* LIO/sec per CPU limit? Is it Hardware or Oracle code?
>>
>>
>>
>> I have two identical servers (or so I am told), but application work is
>> running 2-3 times slower on one than the other. Using Tanel's snapper, I
>> see that all active sessions are all on CPU. Viewing top shows me the same
>> thing, each session pegs a cpu. We also found that it wasn't particular SQL
>> that slowed down across severs, but it looked like everything was slow. A
>> select count(*) from dba_objects showed this behavior as did Jonathan
>> Lewis's kill_cpu script. This gave me something to test with. Running a
>> 10046, I saw the same amount of resource utilization (parse count, fetch
>> count, cr count, ...), no contention (wait events), but one server finished
>> 2.5 times faster than the other. Looking at session stats through snapper,
>> I see that the number of session logical reads per sec (~all of which are
>> consistent reads) is ~ 2.5 times higher on one server than the other. That
>> explains why it takes one longer to finish.
>>
>>
>>
>> So, now what?? Why is one server giving me 350k consistent gets/per
>> second and the other is ~800k? Is it hardware? /proc/cpuinfo shows the same
>> cpu for each box. Is it hidden in the Oracle code path? I realize that not
>> all LIO are created equal, but how do I check this? I am running on
>> SE12.1.0.1
>>
>>
>>
>> Any and all thoughts welcome.
>>
>>
>>
>> Henry
>>
>
>

-- 
Karl Arao
Blog: karlarao.wordpress.com
Wiki: karlarao.tiddlyspot.com
Twitter: _at_karlarao <http://twitter.com/karlarao>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Aug 11 2017 - 01:16:02 CEST

Original text of this message