Re: LIO/sec per CPU limit? Is it Hardware or Oracle code?

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Thu, 10 Aug 2017 19:06:59 -0400
Message-ID: <86024ed0-88c0-79f2-ef19-fde06b0d80f1_at_gmail.com>



Henry, have you thought of testing IO on both boxes? Something like bonnie++ or SLOB could tell you the differences in the IO characteristics of your system. Also, if the underlying OS is Linux newer than RH 5.x, you can use atop to see how much IO are you actually doing on the systems.

There is also a distinct possibility of the systems having different memory types. DDR2, DDR3 and DDR4 are very different animals. You can check the memory types using dmidecode --type 17. Here is the result from my machine:

root_at_umajor:~# dmidecode --type 17
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x0043, DMI type 17, 34 bytes
Memory Device

     Array Handle: 0x0042
     Error Information Handle: Not Provided
     Total Width: 64 bits
     Data Width: 64 bits
     Size: 8192 MB
     Form Factor: DIMM
     Set: None
     Locator: ChannelA-DIMM0
     Bank Locator: BANK 0
     Type: DDR3
     Type Detail: Synchronous
     Speed: 1600 MHz
     Manufacturer: 1315
     Serial Number: 00000000
     Asset Tag: 9876543210
     Part Number: BLS8G3D1609DS1S00.
     Rank: 2
     Configured Clock Speed: 1600 MHz

Handle 0x0044, DMI type 17, 34 bytes
Memory Device

     Array Handle: 0x0042
     Error Information Handle: Not Provided
     Total Width: 64 bits
     Data Width: 64 bits
     Size: 8192 MB
     Form Factor: DIMM
     Set: None
     Locator: ChannelA-DIMM1
     Bank Locator: BANK 1
     Type: DDR3
     Type Detail: Synchronous
     Speed: 1600 MHz
     Manufacturer: 1315
     Serial Number: 00000000
     Asset Tag: 9876543210
     Part Number: BLS8G3D1609DS1S00.
     Rank: 2
     Configured Clock Speed: 1600 MHz

Handle 0x0045, DMI type 17, 34 bytes
Memory Device

     Array Handle: 0x0042
     Error Information Handle: Not Provided
     Total Width: 64 bits
     Data Width: 64 bits
     Size: 8192 MB
     Form Factor: DIMM
     Set: None
     Locator: ChannelB-DIMM0
     Bank Locator: BANK 2
     Type: DDR3
     Type Detail: Synchronous
     Speed: 1600 MHz
     Manufacturer: 1315
     Serial Number: 00000000
     Asset Tag: 9876543210
     Part Number: BLS8G3D1609DS1S00.
     Rank: 2
     Configured Clock Speed: 1600 MHz

Handle 0x0046, DMI type 17, 34 bytes
Memory Device

     Array Handle: 0x0042
     Error Information Handle: Not Provided
     Total Width: 64 bits
     Data Width: 64 bits
     Size: 8192 MB
     Form Factor: DIMM
     Set: None
     Locator: ChannelB-DIMM1
     Bank Locator: BANK 3
     Type: DDR3
     Type Detail: Synchronous
     Speed: 1600 MHz
     Manufacturer: 1315
     Serial Number: 00000000
     Asset Tag: 9876543210
     Part Number: BLS8G3D1609DS1S00.
     Rank: 2
     Configured Clock Speed: 1600 MHz

On my system, I have 4 8GB banks of DDR3 memory. There is also information about the clock speed, which can significantly influence the memory access speed. You should also check the cache sizes on your machine:

root_at_umajor:~# lshw -C memory

   *-firmware

        description: BIOS
        vendor: American Megatrends Inc.
        physical id: 0
        version: F6
        date: 06/17/2014
        size: 64KiB
        capacity: 15MiB
        capabilities: pci upgrade shadowing cdboot bootselect 
socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi

   *-cache:0

        description: L1 cache
        physical id: 3e
        slot: CPU Internal L1
        size: 256KiB
        capacity: 256KiB
        capabilities: synchronous internal write-back

Level 1 cache is the most significant. If the memory address is cached in L1 cache, the CPU doesn't have to go to MMU to fetch it. One system having significantly larger L1 cache than the other would also mean a lot faster memory access on average. Basically, the logic is very simple: your system has 3 main components: CPU, memory and disks. If CPU is the same, you should compare IO performance using bonnie++ and memory speed. My assumption is that there is difference in both of those factors. However, before venturing into that, check paging and swapping on both systems. Paging and swapping are performance killers and you may have them on one of your systems. Different file systems can also account for the speed degradation. Finally, I wish you good luck. You'll need it.

On 08/09/2017 05:46 PM, Henry Poras wrote:
> I have two identical servers (or so I am told), but application work
> is running 2-3 times slower on one than the other. Using Tanel's
> snapper, I see that all active sessions are all on CPU. Viewing top
> shows me the same thing, each session pegs a cpu. We also found that
> it wasn't particular SQL that slowed down across severs, but it looked
> like everything was slow. A select count(*) from dba_objects showed
> this behavior as did Jonathan Lewis's kill_cpu script. This gave me
> something to test with. Running a 10046, I saw the same amount of
> resource utilization (parse count, fetch count, cr count, ...), no
> contention (wait events), but one server finished 2.5 times faster
> than the other. Looking at session stats through snapper, I see that
> the number of session logical reads per sec (~all of which are
> consistent reads) is ~ 2.5 times higher on one server than the other.
> That explains why it takes one longer to finish.
>
> So, now what?? Why is one server giving me 350k consistent gets/per
> second and the other is ~800k? Is it hardware? /proc/cpuinfo shows the
> same cpu for each box. Is it hidden in the Oracle code path? I
> realize that not all LIO are created equal, but how do I check this? I
> am running on SE12.1.0.1
>
> Any and all thoughts welcome.
>
> Henry

-- 
Mladen Gogala
Oracle DBA
Tel: (347) 321-1217


--
http://www.freelists.org/webpage/oracle-l
Received on Fri Aug 11 2017 - 01:06:59 CEST

Original text of this message