RE: Higher CPU Utilisation on failover node under same workload

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 7 Nov 2014 05:12:39 -0500
Message-ID: <136901cffa73$5d1e8ce0$175ba6a0$_at_rsiz.com>



One NULL hypothesis down then.  

Okay, so the next up would be an asymmetry in the way the storage is mounted or used by the two different nodes. Or a difference in some parameter settings that are specified from storage local to the node.  

I'm thinking something like network pathways to the storage or whether various OS level system parameters possibly enabling less CPU intensive reading or something like that.

If some big stuff is being dragged into the buffer cache on one system but is used with direct read into the pga on the other that could change CPU utilization.  

Apart from Oracle, are there any parameter files regarding the storage that live on local storage on the nodes? Not knowing your storage complex at all that is a fishing expedition. Since the whole storage is moved I *think* the difference must be routed either in something like wiring to the storage or a parameter file stored locally.  

Since it is the whole storage being remounted that rules out a lot of possibilities such as parameter file differences in Oracle. Even the double underbar stuff will be identical because it is identical (unless your parameter files are on local storage.)  

JL put up an evolving laundry list of "Nothing changed, so why is it different" issues on his blog and solicited group source feedback a while back. That is worth a look. Applying the thought "Why guess when you can know?" is difficult when the scope of the change is the whole system. Is it possible to narrow the differential CPU burn to something more specific?  

Good luck, and thanks for the kind words even though my previous suggestion was not your solution,  

mwf  

From: Osborne, Chris [mailto:Chris.Osborne_at_bskyb.com] Sent: Friday, November 07, 2014 4:11 AM
To: Mark W. Farnham; fuzzy.graybeard_at_gmail.com; oracle-l_at_freelists.org Subject: RE: Higher CPU Utilisation on failover node under same workload  

HI Mark,  

Thanks for that. I had considered that, but when we shut the instance down on the failover node, and start up again on the other node we do not see the problem.

If this was an issue of 'warming up' the DB, we would see it regardless of which direction we were moving. Additionally, we don't see the issue being alleviated when the system has been up for a few days.

As I said it's a two node VCS cluster so when we fail over it's a database shutdown on one node, take the storage offline on that node, bring the storage up on the other node and start the instance. This is true regardless of whether we're going from node a to node b, or vice versa.  

I do like your suggestion though for a 'failover startup kit' in general though.  

Chris        

Christopher Osborne

Lead Technical Specialist, Performance Engineering

British Sky Broadcasting

Email:chris.osborne_at_bskyb.com

Desk: +44 1506 325069 | Mobile: +44 7720 308941

Please note new Mobile number.  

oebanner4ps_gap2_620  

From: Mark W. Farnham [mailto:mwf_at_rsiz.com] Sent: 06 November 2014 16:49
To: Osborne, Chris; fuzzy.graybeard_at_gmail.com; oracle-l_at_freelists.org Subject: RE: Higher CPU Utilisation on failover node under same workload  

I would say the NULL hypothesis is that the fail over node never reaches steady state compared to the normal production workload.  

As such all the caching of sql, packages, and procedures that takes place in the shared pool, java in the java pool, and data in the buffer cache is burning cpu above the normal workload, EVEN if no extra user transaction load is taking place.  

A partial cure for this is included in my unwritten (likely never to be written) book about planning for business continuation.  

The synopsis relevant to you is: Mine your shared pool for stored procedures and all the read only queries. (Planning for updating rows using canonical special values should only be attempted at the post graduate level). Mine your buffer cache for slowly changing objects. Build yourself a failover startup kit that runs those procedures as soon as you start the database but before you turn the users loose. Do all the SYS and SYSTEM owned packages and procedures first. Remember to do the double (or more) hit and to implement something to avoid direct read for things you want to stay in the buffer cache.  

Hmm. I wonder if I can make a 45 minute presentation out of that.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Osborne, Chris
Sent: Thursday, November 06, 2014 10:40 AM To: fuzzy.graybeard_at_gmail.com; oracle-l_at_freelists.org Subject: RE: Higher CPU Utilisation on failover node under same workload  

Immediate. And yes it completes successfully.  

Chirs      

Christopher Osborne

Lead Technical Specialist, Performance Engineering

British Sky Broadcasting

Email:chris.osborne_at_bskyb.com

Desk: +44 1506 325069 | Mobile: +44 7720 308941

Please note new Mobile number.  

oebanner4ps_gap2_620  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Hans Forbrich
Sent: 06 November 2014 14:53
To: oracle-l_at_freelists.org
Subject: Re: Higher CPU Utilisation on failover node under same workload  

On 06/11/2014 3:13 AM, Osborne, Chris wrote:

When we fail over, it's a database shutdown,

Just to confirm - what 'kind' of shutdown are you using, and if a 'clean' one, does it complete?

/Hans

Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.

Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of British Sky Broadcasting Group plc and Sky International AG and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.



--
http://www.freelists.org/webpage/oracle-l


image001.png
Received on Fri Nov 07 2014 - 11:12:39 CET

Original text of this message