Re: Oracle RAC and IRQ Balance

From: Greg Rahn <greg_at_structureddata.org>
Date: Sun, 9 Oct 2011 17:35:34 -0700
Message-ID: <CAGXkmivUtN9Re6KkZihuopfufGBVHBxWe2Dcdq6sncJVQWOEBQ_at_mail.gmail.com>



A few things:
- Just for clarity - this isn't RAC specific. The issue of "burning" an entire core/thread on interrupts can happen on any system sending enough packets. I've seen it plenty of times on network interfaces to chatty application tiers.
- Even though on said core/thread there is 41.33% user, the %idle is only 4.08% - so this little guy is almost out of gas.

As long as 1 core/thread doesn't run out of gas, this shouldn't be an issue, but in this case, it's pretty darn close -- too close for my comfort. I'd recommend enabling irqbalance and monitoring the workload & sys metrics carefully. More details on this can be found at http://irqbalance.org/

You may find that collectl [http://collectl.sourceforge.net/] comes in handy for gathering & monitoring this sys metric (and others!). My mantra on collectl is: "If your OS is Linux and you are not using collectl, you probably should be." (I'm a big fan.)

Cheers,

On Sun, Oct 9, 2011 at 10:00 AM, Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com> wrote:
> Hello Jed
>   NIC cards interrupt CPU for the packet delivery. Of course, in a busy RAC
> database, there can be huge amount of network packets being transferred
> leading to high IRQs. If IRQs are pinned to be interrupted to one CPU, then
> latency in that CPU can cause issues as kernel threads need to be scheduled
> to serve the irqs only in that CPU.
>   If you want IRQs to be pinned to one CPU, then you should make sure that
> no other process is scheduled to execute in that CPU. But, I see that 40% of
> usage in CPU in USER mode which indicates that this is probably not
> happening in your case.
>  But, why is this important for you? Do you see network delays causing RAC
> performance issues? If yes, then I don't see an issue of IRQs being serviced
> by all CPUs. Also, I am surprised that this is not a default.
>
>
> On Mon, Oct 3, 2011 at 4:29 PM, Walker, Jed S
> <Jed_Walker_at_cable.comcast.com>wrote:
>
>> Back to my learning of RAC. Today, it was suggested that we turn on
>> IRQBALANCE on our Oracle 11.2.0 RAC systems to help distribute the IRQ load,
>> to hopefully help with performance. I did a check and can see that just one
>> CPU appears to be handling all of these.
>> mpstat -P ALL 2
>> Linux 2.6.18-53.el5 (node-01)         10/03/2011
>>
>> 09:19:46 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
>> %idle    intr/s
>> 09:19:48 PM  all   14.30    0.00    3.04   23.54    0.25    1.27    0.00
>> 57.59  10903.06
>> 09:19:48 PM    0   41.33    0.00    9.18   40.31    1.02    4.08    0.00
>>  4.08  10902.55
>> 09:19:48 PM    1    2.55    0.00    0.51   14.29    0.00    0.00    0.00
>> 82.65      0.00
>> 09:19:48 PM    2   12.24    0.00    2.04   34.18    0.00    0.00    0.00
>> 52.04      0.00
>> 09:19:48 PM    3    1.02    0.00    0.51    6.63    0.00    0.00    0.00
>> 92.35      0.00
>> (this is consistent over a period of time)
>>
>> I then read an article saying that in many cases this doesn't matter -
>> something to do with processes being pinned to a CPU (Sorry, I can't find
>> the article again!).
>>
>> Does anyone have any experience, or is there a good practice for this and
>> RAC?
>>
>> service irqbalance start
>> chkconfig irqbalance on
>>

-- 
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Sun Oct 09 2011 - 19:35:34 CDT

Original text of this message