Re: LIO/sec per CPU limit? Is it Hardware or Oracle code?
Date: Thu, 10 Aug 2017 19:06:59 -0400
Message-ID: <86024ed0-88c0-79f2-ef19-fde06b0d80f1_at_gmail.com>
Henry, have you thought of testing IO on both boxes? Something like bonnie++ or SLOB could tell you the differences in the IO characteristics of your system. Also, if the underlying OS is Linux newer than RH 5.x, you can use atop to see how much IO are you actually doing on the systems.
There is also a distinct possibility of the systems having different memory types. DDR2, DDR3 and DDR4 are very different animals. You can check the memory types using dmidecode --type 17. Here is the result from my machine:
root_at_umajor:~# dmidecode --type 17
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
Handle 0x0043, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 8192 MB Form Factor: DIMM Set: None Locator: ChannelA-DIMM0 Bank Locator: BANK 0 Type: DDR3 Type Detail: Synchronous Speed: 1600 MHz Manufacturer: 1315 Serial Number: 00000000 Asset Tag: 9876543210 Part Number: BLS8G3D1609DS1S00. Rank: 2 Configured Clock Speed: 1600 MHz
Handle 0x0044, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 8192 MB Form Factor: DIMM Set: None Locator: ChannelA-DIMM1 Bank Locator: BANK 1 Type: DDR3 Type Detail: Synchronous Speed: 1600 MHz Manufacturer: 1315 Serial Number: 00000000 Asset Tag: 9876543210 Part Number: BLS8G3D1609DS1S00. Rank: 2 Configured Clock Speed: 1600 MHz
Handle 0x0045, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 8192 MB Form Factor: DIMM Set: None Locator: ChannelB-DIMM0 Bank Locator: BANK 2 Type: DDR3 Type Detail: Synchronous Speed: 1600 MHz Manufacturer: 1315 Serial Number: 00000000 Asset Tag: 9876543210 Part Number: BLS8G3D1609DS1S00. Rank: 2 Configured Clock Speed: 1600 MHz
Handle 0x0046, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 8192 MB Form Factor: DIMM Set: None Locator: ChannelB-DIMM1 Bank Locator: BANK 3 Type: DDR3 Type Detail: Synchronous Speed: 1600 MHz Manufacturer: 1315 Serial Number: 00000000 Asset Tag: 9876543210 Part Number: BLS8G3D1609DS1S00. Rank: 2 Configured Clock Speed: 1600 MHz
On my system, I have 4 8GB banks of DDR3 memory. There is also information about the clock speed, which can significantly influence the memory access speed. You should also check the cache sizes on your machine:
root_at_umajor:~# lshw -C memory
*-firmware
description: BIOS vendor: American Megatrends Inc. physical id: 0 version: F6 date: 06/17/2014 size: 64KiB capacity: 15MiB capabilities: pci upgrade shadowing cdboot bootselectsocketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-cache:0
description: L1 cache physical id: 3e slot: CPU Internal L1 size: 256KiB capacity: 256KiB capabilities: synchronous internal write-back
Level 1 cache is the most significant. If the memory address is cached in L1 cache, the CPU doesn't have to go to MMU to fetch it. One system having significantly larger L1 cache than the other would also mean a lot faster memory access on average. Basically, the logic is very simple: your system has 3 main components: CPU, memory and disks. If CPU is the same, you should compare IO performance using bonnie++ and memory speed. My assumption is that there is difference in both of those factors. However, before venturing into that, check paging and swapping on both systems. Paging and swapping are performance killers and you may have them on one of your systems. Different file systems can also account for the speed degradation. Finally, I wish you good luck. You'll need it.
On 08/09/2017 05:46 PM, Henry Poras wrote:
> I have two identical servers (or so I am told), but application work
> is running 2-3 times slower on one than the other. Using Tanel's
> snapper, I see that all active sessions are all on CPU. Viewing top
> shows me the same thing, each session pegs a cpu. We also found that
> it wasn't particular SQL that slowed down across severs, but it looked
> like everything was slow. A select count(*) from dba_objects showed
> this behavior as did Jonathan Lewis's kill_cpu script. This gave me
> something to test with. Running a 10046, I saw the same amount of
> resource utilization (parse count, fetch count, cr count, ...), no
> contention (wait events), but one server finished 2.5 times faster
> than the other. Looking at session stats through snapper, I see that
> the number of session logical reads per sec (~all of which are
> consistent reads) is ~ 2.5 times higher on one server than the other.
> That explains why it takes one longer to finish.
>
> So, now what?? Why is one server giving me 350k consistent gets/per
> second and the other is ~800k? Is it hardware? /proc/cpuinfo shows the
> same cpu for each box. Is it hidden in the Oracle code path? I
> realize that not all LIO are created equal, but how do I check this? I
> am running on SE12.1.0.1
>
> Any and all thoughts welcome.
>
> Henry
-- Mladen Gogala Oracle DBA Tel: (347) 321-1217 -- http://www.freelists.org/webpage/oracle-lReceived on Fri Aug 11 2017 - 01:06:59 CEST