RE: OT : kernel using 75% of CPU

From: <Riyaj_Shamsudeen_at_i2.com>
Date: Tue, 28 Aug 2001 15:32:36 -0700
Message-ID: <F001.0037A78D.20010828151550@fatcity.com>

Hi jerry
Since the most of the cpu is being used in system mode, I would not suspect latch contention at all. If you have any latch contention the usage will be in user% mode instead of system mode. May be you/ your sysadmin need to check the error logs in the EMC or in the server itself. 75% CPU is way high, only hardware errors such as disk problems / controller problems will cause this..
You could truss the Oracle processes with 'truss -p pid' utility and see what system calls the processes are making and that may give a clue..
Thanks
Riyaj "Re-yas" Shamsudeen
Certified Oracle DBA
i2 technologies www.i2.com

Kevin Lange <kgel_at_ppoone.com>
Sent by: root_at_fatcity.com
08/28/01 05:20 PM
Please respond to ORACLE-L

To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
cc:
Subject: RE: OT : kernel using 75% of CPU

Jerry;
Has there been any system parameter changes lately ??

I don't know about on your system, but on our AIX box there was a parameter called MBUFs that dealt with Communication Buffers. Now, you would not think this would have any consideration on the database, but it did.

MBUFS is the Maximum Allowable Communication Buffer on an AIX system. We thought that it dealt with the networking and set it up to its maximum of 64 Meg. We also did not think this would bother Oracle.... boy were we wrong.

Apparently, its also Interprocess comminications as well. And, since Oracle is greedy, it likes to take all the memory the system will give it. It turned out that each process that was started (i.e. every user who logged on ) grabbed the maximum memory setup by MBUFS. So, take a 900 Meg SGA and add onto it a 1 Meg Sort area per user and a 64 Meg MBUF per user and 350 users and you can see why our 4GB of memory went real fast.

Sometimes, things that don't appear to be related can all of a sudden jump up and bite you.

Kevin
-----Original Message-----
From: Jerry C [mailto:usidba_at_yahoo.com]
Sent: Tuesday, August 28, 2001 4:46 PM
To: Multiple recipients of list ORACLE-L Subject: Re: OT : kernel using 75% of CPU

Guy,

Thank you very very much, this is a great explanation, and is much appreciated.

To answer some of your questions (and add a few!):

Yes, our client is experiencing performance problems.

vmstat and swap -s seem to show some swapping:

csuaor46> vmstat 2 10

 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s6 s1 s1 s5   in   sy   cs us sy id
 2 0 0  14536 14776  67 1514 15 228 957 62760 262 0 2 2 0 640 115 931 25 25 50

2 0 0 6179304 62416 36 1893 0 744 5500 56488 1126 0 4 4 0 988 6159 917 23 55 22 1 0 0 6177696 62600 20 908 0 1032 11808 56488 2488 0 6 6 0 917 3781 667 24 51 25 0 0 0 6181688 62960 89 1528 4 288 444 56488 37 0 6 6 0 1076 19029 862 23 54 23 1 0 0 6181336 64432 15 1269 0 140 576 56488 110 0 1 1 0 456 8550 493 14 46 40 0 0 0 6182376 63776 18 2976 4 368 1008 62760 328 0 8 8 0 594 6163 831 14 54 32 2 0 0 6180800 63072 9 1746 0 300 1296 62760 202 0 1 1 0 661 4441 693 12 65 23 0 0 0 6178120 62728 47 1311 4 612 2272 56488 464 0 3 2 0 829 5535 801 34 38 28 2 0 0 6179944 64616 36 1322 0 364 764 62760 70 0 0 0 0 996 4786 739 13 69 18 0 0 0 6183112 62560 40 856 4 340 1444 62760 339 0 1 2 0 822 4107 707 10 40 50 csuaor46> swap -s
total: 2602216k bytes allocated + 19960k reserved = 2622176k used, 6177752k available I would assume swapping operations would be included under "kernel"? The app also uses java, is there any way to determine if Java is performing any wacky system calls?

There are 3 databases on this box, which has 4 Gb. of memory:

csuaor46> ps -ef |grep ora_ |grep smon

  oracle   867     1  0   Aug 16 ?        0:13 ora_smon_tstrn
  oracle   981     1  0   Aug 16 ?        0:14 ora_smon_tsdmo
  oracle 19561     1  0   Aug 23 ?        0:35 ora_smon_tsprd

The main db (tsprd) has an SGA of 1.7 Gb., the other 2 are ~180 Mb. each. - so that's ~2.1 Gb. There are only 49 connections to the 3 databases:

csuaor46> ps -ef |grep LOCAL |wc
49 447 3418

We are not using MTS. Is there any way to determine the amount of real memory that these dedicated connections are using? I can't see how the whole 4 Gb. would be used, causing the system to swap... ?

The primary database:
- has an SGA of 1.7GB
shared pool 550Mb.
buffer cache 640Mb.
java pool 470Mb.! (>460Mb. of which is free)
- logical I/O rate ~3,000 blocks/sec.
- physical I/O rate 500-1,000 I/O/sec (disk is EMC, RAID 1+0 I think)

Everything internal to the db doesn't look that bad, although I'm guessing they don't need so much java pool and the shared pool could be downsized...

Still stumped....

Thanks again.

csuaor46> iostat -xtc 15 20

                               extended device statistics      tty         cpu
device    r/s  w/s   kr/s   kw/s wait actv  svc_t  %w  %b  tin tout us sy wt id
sd6       0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0    0   18 25 25  2 48
sd11      1.0  1.4   11.3   24.7  0.0  0.1   32.0   0   2 
sd12      1.0  1.4   11.3   24.7  0.0  0.1   45.3   0   2 
sd58      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
sd75      0.2  0.4   11.4    3.7  0.0  0.0    1.6   0   0 
sd76      8.9  0.4  335.9    3.2  0.0  0.0    6.3   0   3 
sd77      0.2  0.0    7.0    0.1  0.0  0.0    4.2   0   0 
sd78      0.5  0.1   23.2    1.3  0.0  0.0    6.3   0   0 
sd79      0.0  0.0    1.5    0.2  0.0  0.0    5.3   0   0 
sd135     0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
sd152     0.2  0.4   11.4    4.2  0.0  0.0    1.6   0   0 
sd153     9.0  0.4  337.5    3.2  0.0  0.0    5.8   0   3 
sd154     0.2  0.0    6.9    0.1  0.0  0.0    3.8   0   0 
sd155     0.5  0.1   23.2    1.3  0.0  0.0    6.1   0   0 
sd156     0.0  0.0    1.5    0.2  0.0  0.0    5.0   0   0 
sd881     0.2  2.1    5.6    4.3  0.0  0.0   10.8   0   1 
sd882     0.0  0.1    3.5    6.6  0.0  0.0    4.6   0   0 
sd883     0.4  0.0   13.1    0.3  0.0  0.0    8.2   0   0 
sd884     0.2  0.0   10.6    0.1  0.0  0.0    2.5   0   0 
sd885     0.3  0.0   21.9    0.0  0.0  0.0    1.6   0   0 
sd886     2.1  7.6   32.9   64.1  0.0  0.0    5.0   0   2 
sd887     0.5  0.7   26.1   15.9  0.0  0.0    4.7   0   0 
sd888     0.5  0.1   25.0    1.7  0.0  0.0   15.5   0   1 
sd889     0.5  0.1   28.0    1.1  0.0  0.0    7.9   0   0 
sd890     0.7  0.4   31.2    4.1  0.0  0.0    5.2   0   0 
sd891     0.7  0.3   30.8    3.2  0.0  0.0    5.9   0   1 
sd892     0.4  1.3   31.9   33.7  0.0  0.0    3.7   0   0 
sd893     0.2  0.3   17.0   18.7  0.0  0.0    6.0   0   0 
sd894     0.2  0.3   20.1   21.2  0.0  0.0    6.4   0   0 
sd895     3.8  0.9  123.6    7.4  0.0  0.0    8.3   0   2 
sd896     7.3  0.4  292.7    3.1  0.0  0.0    6.9   0   3 
sd897     2.8  0.3  102.4    2.4  0.0  0.0    8.1   0   2 
sd1105    0.2  1.9    2.6    3.9  0.0  0.0    9.9   0   0 
sd1106    0.1  0.1    5.5    6.6  0.0  0.0    5.0   0   0 
sd1107    0.4  0.0   12.8    0.3  0.0  0.0    5.8   0   0 
sd1108    0.2  0.0   10.6    0.1  0.0  0.0    2.3   0   0 
sd1109    0.3  0.0   21.9    0.0  0.0  0.0    1.6   0   0 
sd1110    2.1  8.4   34.3   66.0  0.0  0.1    5.4   0   2 
sd1111    0.5  0.7   26.1   16.8  0.0  0.0    4.6   0   0 
sd1112    0.5  0.1   25.0    1.8  0.0  0.0   14.5   0   1 
sd1113    0.5  0.1   28.0    1.1  0.0  0.0    7.7   0   0 
sd1114    0.7  0.4   31.2    4.0  0.0  0.0    5.1   0   0 
sd1115    0.7  0.3   30.8    3.3  0.0  0.0    5.6   0   1 
sd1116    0.4  1.3   29.2   33.1  0.0  0.0    3.3   0   0 
sd1117    0.2  0.3   17.4   18.7  0.0  0.0    5.9   0   0 
sd1118    0.2  0.3   18.3   21.1  0.0  0.0    6.0   0   0 
sd1119    3.8  0.7  123.5    4.9  0.0  0.0    8.4   0   2 
sd1120    7.2  0.4  292.4    3.2  0.0  0.0    6.8   0   3 
sd1121    2.8  0.3  102.3    2.4  0.0  0.0    7.7   0   2 
nfs1      0.0  0.0    0.0    0.0  0.0  0.0   18.0   0   0

----- Original Message -----
From: Guy Hammond
To: Multiple recipients of list ORACLE-L Sent: Tuesday, August 28, 2001 1:45 PM
Subject: RE: OT : kernel using 75% of CPU

Hi Jerry,

Firstly, the kernel is not a process in the conventional sense. It is basically a set of library functions. One of these is the scheduler, which gets called every time slice, by the timer in the hardware, in order to decide which actual process to run next. Responding to interrupts is the only way in which a kernel could be considered to be running. The kernel exists to provide services to processes, every time a process makes a "system call", for example to perform I/O, this is invoking a function within the kernel to actually "do" it - i.e. an application might call read() then read() in the kernel would handle the business of talking to the device driver and actually doing the reading of the data from the disk (for example).

The CPU states line is showing you *where* the code is running. If it's in "user" then the CPU is spending its time running code in "user land" - probably computational code, stuff that's actually in the application. If the state is "kernel", then it means that your application is making lots of system calls, and the kernel level routines are doing the work.

Incidentally, this is why Java is a good language on the server - it does much of its real work in fast kernel space, and little of it in the slow virtual machine. A busy Oracle will also spend a bit of time in kernel space, doing I/O and networking, accessing shared memory, etc.

Looking at your "top" output, you have a high system load, but your user processes aren't using much CPU, but kernel time is high. This suggests that your processes are spending time waiting for the kernel to do something or other for them, load being the size of the run queue (all the processes that are ready to run but not actually on the CPU). Are you actually experiencing performance problems? If so, you need to look at what the system is doing using "sar" "vmstat" and "iostat". One thing to watch out for is that "top" is a primitive tool. Notice how large all your Oracle processes are? That is because top isn't smart enough to realize that they're all connected to shared memory, it's counting each one as being process size + SGA. So your processes and your memory in use don't add up. Also top deals poorly with LWPs (threads) - are you using MTS? You could simply be seeing threads s! !
tacking up as they wait for network.

HTH,

g

-----Original Message-----
From: Jerry C [mailto:usidba_at_YAHOO.COM]
Sent: Tuesday, August 28, 2001 4:20 PM
To: Multiple recipients of list ORACLE-L Subject: OT : kernel using 75% of CPU

Hi there,

I have a Sun e4500, running Solaris 2.7 and Oracle 8.1.7.1.0. Everything looks normal from a database perspective, but when I run "top" it show the kernel being very hog-like:

load averages: 14.38, 15.18, 15.18 07:16:21 126 processes: 118 sleeping, 4 running, 4 on cpu CPU states: 0.6% idle, 26.6% user, 72.8% kernel, 0.0% iowait, 0.0% swap Memory: 4096M real, 63M free, 216M swap in use, 5310M swap free

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND

 2286 oracle     1   0    0 1844M 1814M run     9:44 13.90% oracle
11068 oracle     1   0    0 2056K 1536K cpu0    0:02  1.53% top
11333 oracle     1   0    0 1150M 1124M cpu1    0:01  1.39% oracle
 5944 oracle     1  40    0 1820M 1789M sleep  14:40  1.36% oracle
 4797 root       1  50    0 2112K 1248K sleep   6:01  1.36% top
11346 oracle     1   0    0  110M   92M cpu0    0:01  1.26% oracle
11114 oracle     1   0    0 1009M  984M cpu1    0:00  0.66% oracle
11157 oracle     1   0    0 1009M  984M run     0:00  0.63% oracle
11368 oracle     1  33    0 1794M 1765M sleep   0:00  0.29% oracle
19558 oracle     1  60    0 1797M 1751M sleep  78:28  0.28% oracle
19554 oracle     1  60    0 1794M 1751M sleep  38:05  0.20% oracle
11366 oracle     1  55    0 1793M 1763M sleep   0:00  0.19% oracle
11292 oracle     1  26    2 2008K 1424K run     0:00  0.19% dsql

Any ideas on what I, as a lowly DBA, would be able to check? It's a bit out of my area and I'm stumped...

Thanks!

Jerry Received on Tue Aug 28 2001 - 17:32:36 CDT