Re: LIO/sec per CPU limit? Is it Hardware or Oracle code?

From: Niall Litchfield <niall.litchfield_at_gmail.com>
Date: Thu, 10 Aug 2017 21:53:35 +0100
Message-ID: <CABe10saNcFO90Z-b+tJP8HWmiJS_5ScXUJmNR9Ys-8sE=TitCg_at_mail.gmail.com>



Since we're in guessing without evidence territory but given the order of magnitude of difference described , I'd go for.

  1. Different# of cores/cpus enabled per box. (assuming there isn't a v3 vs v2 same number thing going on).
  2. Power management is enabled on one box not the other.
  3. Memory configuration/speed is different.

On Thu, Aug 10, 2017 at 5:46 PM, Henry Poras <henry.poras_at_gmail.com> wrote:

> Thanks for all of the suggestions. Here is where I am so far:
>
> Kevin - SLOB was always on my list of things I wanted to try and for some
> reason never got around to it (I don't mean for this problem, I mean going
> back a bunch of years). My question here relates to the fact that I can't
> take these machines off-line to run a test. Doesn't SLOB hammer the
> resources enough that I really need to run it as a test machine? not while
> our system is up and running (poorly)? Going over some of your docs to see.
>
> Tony - I'll ask sysadms to check, but it's tough without knowing what to
> ask them to look for.
>
> Karl - Sort of like what I looked at in /proc/cpuinfo, but much easier to
> read. After looking again, the two systems look identical from this level.
> Well, almost. cpu MHz is ~0.5% different (2299.908 vs. 2300.032). Doesn't
> seem like enough of a difference to explain my observations.
>
> Mark - both have same hugepage configuration. Same HugePages_Free, Rsvd,
> Total, and size.
>
> Bhavani - I can't run AWR, but I ran snapper on the same query in order to
> compare resource, latches, and statistics
>
> Hans - Looking for differences, but not sure where to look.
>
> MWF - do you know if there is a way to do this without being root?
>
> Stefan - Thanks for the links. Haven't read these in a while. I'll see
> what I can use.
>
> I'll post more if/when I have it.
>
> Henry
>
>
>
> On Thu, Aug 10, 2017 at 11:13 AM, Reen, Elizabeth <elizabeth.reen_at_citi.com
> > wrote:
>
>> Are the disks set up identically?
>>
>>
>>
>> Liz
>>
>>
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freeli
>> sts.org] *On Behalf Of *Henry Poras
>> *Sent:* Wednesday, August 09, 2017 5:46 PM
>> *To:* ORACLE-L
>> *Subject:* LIO/sec per CPU limit? Is it Hardware or Oracle code?
>>
>>
>>
>> I have two identical servers (or so I am told), but application work is
>> running 2-3 times slower on one than the other. Using Tanel's snapper, I
>> see that all active sessions are all on CPU. Viewing top shows me the same
>> thing, each session pegs a cpu. We also found that it wasn't particular SQL
>> that slowed down across severs, but it looked like everything was slow. A
>> select count(*) from dba_objects showed this behavior as did Jonathan
>> Lewis's kill_cpu script. This gave me something to test with. Running a
>> 10046, I saw the same amount of resource utilization (parse count, fetch
>> count, cr count, ...), no contention (wait events), but one server finished
>> 2.5 times faster than the other. Looking at session stats through snapper,
>> I see that the number of session logical reads per sec (~all of which are
>> consistent reads) is ~ 2.5 times higher on one server than the other. That
>> explains why it takes one longer to finish.
>>
>>
>>
>> So, now what?? Why is one server giving me 350k consistent gets/per
>> second and the other is ~800k? Is it hardware? /proc/cpuinfo shows the same
>> cpu for each box. Is it hidden in the Oracle code path? I realize that not
>> all LIO are created equal, but how do I check this? I am running on
>> SE12.1.0.1
>>
>>
>>
>> Any and all thoughts welcome.
>>
>>
>>
>> Henry
>>
>
>

-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Aug 10 2017 - 22:53:35 CEST

Original text of this message