Re: Physical CPU? or multicore?

From: Karl Arao <karlarao_at_gmail.com>
Date: Wed, 6 May 2009 10:10:24 +0800
Message-ID: <12ee65600905051910n3cddcc5doa590f310b508ab6c_at_mail.gmail.com>



I forgot to mention, it was also an upgrade from 9iR2 to 10.2.0.3. We actually found a bug on the listener when the oracle reports server started to connect to the database subsequent connections were taking to long to spawn and tnsping was around 66300msec, Oracle Support provided the patch. The client said it was fixed, but still they have slow performance.

I asked the IBM engineer that we look in the performance statistics of the zVM, and we see the machine high on CPU. (Note that when you do cat /proc/cpuinfo on the OS there's only one processor) So it could be zVM configuration which they allocated just one cpu on that machine (the sar data tells me that runqueue is reaching 30+ on high OLTP workloads)

Below are the output of ADDM and AWR reports at the time:

             Database Version: 10.2.0.3.0
               Snapshot Range: from 40 to 41
                Database Time: 81897 seconds
        Average Database Load: 17.3 active sessions

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


FINDING 1: 100% impact (81897 seconds)



Host CPU was a bottleneck and the instance was consuming 100% of the host CPU.
All wait times will be inflated by wait for CPU.

   RECOMMENDATION 1: Host Configuration, 100% benefit (81897 seconds)

      ACTION: Consider adding more CPUs to the host or adding instances
         serving the database on other hosts.
      ACTION: Also consider using Oracle Database Resource Manager to
         prioritize the workload from various consumer groups.

   ADDITIONAL INFORMATION:
      Host CPU consumption was 100%.  CPU runqueue statistics are not
      available from the host's OS. This disables ADDM's ability to estimate
      the impact of this finding.

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait
Class
------------------------------ ------------ ----------- ------ ------
----------
CPU time                                          4,512           5.5
db file sequential read              77,404       3,321     43    4.1   User
I/O
latch: cache buffers chains           8,061       2,817    349    3.4
Concurrenc
db file scattered read               63,818       2,691     42    3.3   User
I/O
read by other session                20,435       1,501     73    1.8   User
I/O

And we even have ASM instance crashes ORA-29702 (see Metalink 445023.1), could be caused either by low memory or the high run queue. The client said when they added another CPU on the machine they got better.

The point here is, you can't compare processors apples to apples. It depends on the "other factors" that may limit the scalability of the processor. And the best way know to which is better is benchmark your application on the CPU/core itself, which rarely happens (i think).

On Wed, May 6, 2009 at 1:01 AM, Allen, Brandon <Brandon.Allen_at_oneneck.com>wrote:

> Hopefully you took a look at the waits first before coming to the
> conclusion that the problem was CPU-based, and then if the waits (or lack
> thereof) indicated CPU was your bottleneck, then you looked at the top
> consumers of CPU, most likely queries with high buffer gets, and then
> checked to see if there were any changes or problems with their explain
> plans?

>
>
>

> Based strictly on the info below, it looks like you may have jumped to some
> invalid conclusions.
>
>
>

> Regards,
>

> Brandon
>
>
>
>
>

> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Karl Arao
>

> I had a customer before who migrated their database server from Windows
> with 2 dual core Xeon processors to Suse Linux (on zVM) on IBM system z with
> 1 processor (I believe it's quad core, and with bigger cache and faster
> clock speed). On the first day of production, the run queue was reaching 70+
> that is when the OLTP users started to come in. And when I was
> troubleshooting, my culprit is the CPU that it can't handle all the load
> with all the arrival of the transactions, well the company having invested a
> large amount of money is saying “it's an Oracle bug”. But on the off-peak
> period when they were processing their batch jobs, most of the jobs have
> service time cut to half. Clearly the number of processors are not enough on
> the OLTP workload which they've never encountered on their 2 Dual Core
> Xeons. Kinda weird :)
>
>
>

> ------------------------------
> Privileged/Confidential Information may be contained in this message or
> attachments hereto. Please advise immediately if you or your employer do not
> consent to Internet email for messages of this kind. Opinions, conclusions
> and other information in this message that do not relate to the official
> business of this company shall be understood as neither given nor endorsed
> by it.

>
--
http://www.freelists.org/webpage/oracle-l
Received on Tue May 05 2009 - 21:10:24 CDT

Original text of this message