Fwd: LIO/sec per CPU limit? Is it Hardware or Oracle code?

From: Henry Poras <henry.poras_at_gmail.com>
Date: Fri, 11 Aug 2017 13:05:35 -0400
Message-ID: <CAK5zhLKFxv30Qe4z4UX5LWU9j77_69PdjbmzBjAtB=7xv-mfvg_at_mail.gmail.com>

Do me a favor and CC the Oracle list :) I am not registered yet.

  1. Is NUMA active at the server level or Oracle level ?
  2. Check the memory speeds per other suggestions.
  3. Are they native machines, or some sort of VM ? If VM then other workload may be impacting your VM.

You said the CPU were identical in /proc/cpuinfo, so I assume its the exact same CPU model. Here is a snippet from one of my machines-

processor       : 31
vendor_id       : GenuineIntel
cpu family      : 6
model           : 47
model name      :        Intel(R) Xeon(R) CPU E7- 4820  _at_ 2.00GHz  <--
E7-4820 is the exact cpu model.
stepping        : 2
cpu MHz         : 1997.881
cache size      : 18432 KB

If the cpu models are in any way different, then - well, they are different and your performance could be different. I would not worry about a few percentage points on the cpu MHz- that is not going to make much of a diffference. You can look on the Intel ARK website for specific specs on the cpu. Also look if your cpu speed changes due to the tuned, cpuspeed or similar daemons.
 cat /proc/cpuinfo | grep MHz If the various numbers are less than the rated speed of the CPU, then the clock speed is being shifted up and down based on load. There is nothing wrong with that in principle, and should barely change execution times, but it can confuse Oracle and cause different execution / explain plans. Its best to just turn it off until it can be ruled out as the cause, but it comes at the expense of $10-50 higher electric bill per month, which for a production system is probably trivial.

cat /proc/cpuinfo | grep MHz

cpu MHz         : 1997.881
cpu MHz         : 1997.881
cpu MHz         : 1997.881
cpu MHz         : 1997.881
cpu MHz         : 1997.881
cpu MHz         : 1997.881  <- if it comes back like 1200, 1600 etc or
something lower, then it has a energy saving mode engaged (aka cpuspeed , Intel Speed Step).

-----Henry Poras <henry.poras_at_gmail.com> wrote: ----- To: Jon Crisler <jcrisler_at_us.ibm.com> From: Henry Poras <henry.poras_at_gmail.com> Date: 08/10/2017 12:50PM
Subject: Re: LIO/sec per CPU limit? Is it Hardware or Oracle code?


Thanks for the suggestions. Looked through most of your stuff and nothing shows up. I think it's more cpu/memory related, however. Basically every session (including my test sql) grabs and pegs a cpu. Snapper and v$session (and top) all show ~100% cpu (well, minimal pio). So cutting logical read rate in half will double runtimes. That's what I am seeing. But what would cut the lio rate like that?


On Wed, Aug 9, 2017 at 10:26 PM, Jon Crisler <jcrisler_at_us.ibm.com> wrote:

> I cannot respond directly to the list, but obviously you have some
> difference. it could be hardware related, OS related or Oracle related.
> A few things to check-
> 1) Are your disks on SAN or NFS / Ethernet ? Sometimes what seems to be
> identical disks are really not the same on the backend disk array. Channel
> speeds, ethernet differences etc.
> - example- iSCSI on ethernet- one system has jumbo frames turned on, the
> other does not, and the end result is a significant difference in IO
> 2) Are the CPU's exactly identical ? Same exact model as shown in
> /proc/cpuinfo ? Look at the reported cpu speed- is cpuspeed maybe
> downshifting the cpu ?
> 3) Does oracle understand about the underlying hardware ? See this as a
> starter to compare oracle's knowledge of hardware- SELECT * FROM
> - that might not be the only view to look at-
> #3 has bit me many times when otherwise identical systems give different
> execution plans.
> 4) DBMS_STATS.GATHER_SYSTEM_STATS - proc to run various stats so Oracle
> understands the hardware
> 5) Otherwise identical systems- but installed memory is not the same. One
> has support for NUMA, the other does not, or something goofy with memory
> mirroring / interleaving. So memory access is not the same as one machine
> appears much faster.
> Are they on VM's or something similar ? If that is the case, then other
> workload on the system might be affecting you. I put a lot of emphasis on
> PIO although your question is LIO, but you hav eto check everything.
> ORACHK might be helpful as well to try to identify differences.
> -----oracle-l-bounce_at_freelists.org wrote: -----
> To: ORACLE-L <oracle-l_at_freelists.org>
> From: Henry Poras
> Sent by: oracle-l-bounce_at_freelists.org
> Date: 08/09/2017 05:47PM
> Subject: LIO/sec per CPU limit? Is it Hardware or Oracle code?
> I have two identical servers (or so I am told), but application work is
> running 2-3 times slower on one than the other. Using Tanel's snapper, I
> see that all active sessions are all on CPU. Viewing top shows me the same
> thing, each session pegs a cpu. We also found that it wasn't particular SQL
> that slowed down across severs, but it looked like everything was slow. A
> select count(*) from dba_objects showed this behavior as did Jonathan
> Lewis's kill_cpu script. This gave me something to test with. Running a
> 10046, I saw the same amount of resource utilization (parse count, fetch
> count, cr count, ...), no contention (wait events), but one server finished
> 2.5 times faster than the other. Looking at session stats through snapper,
> I see that the number of session logical reads per sec (~all of which are
> consistent reads) is ~ 2.5 times higher on one server than the other. That
> explains why it takes one longer to finish.
> So, now what?? Why is one server giving me 350k consistent gets/per second
> and the other is ~800k? Is it hardware? /proc/cpuinfo shows the same cpu
> for each box. Is it hidden in the Oracle code path? I realize that not all
> LIO are created equal, but how do I check this? I am running on SE12.1.0.1
> Any and all thoughts welcome.
> Henry

Received on Fri Aug 11 2017 - 19:05:35 CEST

Original text of this message