Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> oracle cannot produce valid timed statistics on linux machine with 2 non-identical CPUs

oracle cannot produce valid timed statistics on linux machine with 2 non-identical CPUs

From: Sergey Lukashevich <lukash33_at_mail.ru>
Date: 29 Jan 2003 09:40:52 -0800
Message-ID: <51f5722e.0301290940.430d5689@posting.google.com>


Below I will describe a problem of using oracle for linux on an SMP intel machine with CPUs of a different bogomips-measured speed. This can be a linux kernel bug.

First of all what's wrong:

  1. We cannot receive reasonable figures in all the 'elapsed' columns in all the oracle statistic when timed_statistics in 'on' in the init.ora. All the figures we'll see look like '##########' or are enormous, very big (totally unreal). It does not matter whether we SELECT them from a V$ view or we look at TKPROF result or we take a StatsPack snapshot. No problem when the CPU is only one.
  2. Even more, Oracle rdbms obviously becomes ill-behaved -- strange unresolvable performance problems arise, especially with different kind of latches like 'free buffer waits'. Users wait, wait, and stuck. Checkpoint is executed VERY lengthy - some 20-30 minutes while DBWR does almost nothing (we watch 'top') and the I/O is less than 10% of the power of the disk subsystem (we have our disks benchmarked).
  3. There is one more linux sympthom possibly: the 'top' output looks wrong: sometimes we saw some 5 to 8 processes consuming 99.9% of cpu. That's impossible while having only 2 CPUs!
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  136 root      15   0     0    0     0 RW   99.9  0.0  20:22
kjournald
18767 root      15   0  1928 1868  1552 S    99.9  0.0   0:00 sshd
20601 oracle    15   0  5728 5728  5176 S    99.9  0.2   0:00 oracle
20603 oracle    16   0  192M 192M  190M D    99.9  9.5   0:22 oracle
20605 oracle    17   0  7112 7112  6536 S    99.9  0.3   0:15 oracle
20618 oracle    15   0 56152  54M 55224 S    99.9  2.7   0:12 oracle
22011 oracle    26   0  222M 222M  217M R    99.9 11.0   4:42 oracle
22045 me        15   0   904  904   728 R    99.9  0.0   0:00 top
    1 root      15   0   468  428   412 S     0.0  0.0   0:10 init
    2 root      15   0     0    0     0 SW    0.0  0.0   0:00 keventd
    3 root      34  19     0    0     0 SWN   0.0  0.0   0:00
ksoftirqd_CPU0
    4 root      34  19     0    0     0 SWN   0.0  0.0   0:00
ksoftirqd_CPU1

4) The simpliest way to determine wether you have this linux bug is to run the command:

   yes date | bash | uniq

If the result looks like mine then that's the case:

Wed Jan 29 20:22:17 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:19 MSK 2003
Wed Jan 29 20:22:18 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:18 MSK 2003
Wed Jan 29 20:22:16 MSK 2003
Wed Jan 29 20:22:18 MSK 2003
Wed Jan 29 20:22:16 MSK 2003

The time/date continuously jumps forward and backward in a range of a few seconds Who guess why? Possibly different CPUs show different date/time.
But I appreciate they almost agree each other ;)

####

I have found NO information on metalink.oracle.com regarding the issue.
I have found very few information on groups.google.com about the problem:

http://groups.google.com/groups?selm=375BF011.C600C5EB%40best.com&oe=UTF-8&output=gplain

We reproduced all the sympthoms on several dual-pentium machines and now we have to replace their CPUs I think.

My hardware is:

Intel based server of 2*Pentium III (Coppermine)

>grep bogomips /proc/cpuinfo

bogomips        : 1861.22
bogomips        : 1599.07

but

>grep -i mhz /proc/cpuinfo

cpu MHz         : 932.943
cpu MHz         : 932.943

You can see that both CPUs look like of the same speed of 932.943 MHZ, but bogomipses differ.

My software is:

Linux host.domain 2.4.18-3custom #4 SMP Thu Jan 23 09:14:40 MSK 2003 i686 unknown

Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production

Please let me know whether there are other sites with similar simpthoms and whether you consider these to be a linux kernel bug? Whether other OSes have similar problems or no? Received on Wed Jan 29 2003 - 11:40:52 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US