Re: Oracle RAC and IRQ Balance

From: Riyaj Shamsudeen <riyaj.shamsudeen_at_gmail.com>
Date: Sun, 9 Oct 2011 20:16:20 -0500
Message-ID: <CAA2Dszyq8zamH3azAD_wedLv4z8SCT8OrnSyggTLkXDE_xBXPA_at_mail.gmail.com>



Hi

   Agreed that a core being bombarded with interrupts can happen in any application with high interrupts.

   But, in this case CPU 0, user% is 41.33% and iowait% is 40%. As you very well know, iowait% is not reflective of true idleness and simply indicates there was an outstanding system call during an idle CPU cycle. So, there is some CPU cycles left in that CPU.

   Considering that iowait is 40%, these interrupts could potentially due to I/O interrupts too.

   So, I am cool with irqbalance, but for the reasoning that (i) LMS & VKTM runs in elevated priority, which is kernel preemptive, with irqbalance the probability of kernel thread preemption decreases (ii) Since the OP is asking the question, may be that OP has a performance issue trying to resolve, potentially that OP was able to tie the performance issue to CPU latency issues.

Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals - http://www.orainternals.com - Specialists in Performance, RAC and EBS11i
Blog: http://orainternals.wordpress.com
OakTable member http://www.oaktable.com

Co-author of the books: Expert Oracle
Practices<http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL, Expert PL/SQL
Practices<http://tinyurl.com/book-expert-plsql-practices>

On Sun, Oct 9, 2011 at 7:35 PM, Greg Rahn <greg_at_structureddata.org> wrote:

> A few things:
> - Just for clarity - this isn't RAC specific. The issue of "burning"
> an entire core/thread on interrupts can happen on any system sending
> enough packets. I've seen it plenty of times on network interfaces to
> chatty application tiers.
> - Even though on said core/thread there is 41.33% user, the %idle is
> only 4.08% - so this little guy is almost out of gas.
>
> As long as 1 core/thread doesn't run out of gas, this shouldn't be an
> issue, but in this case, it's pretty darn close -- too close for my
> comfort. I'd recommend enabling irqbalance and monitoring the
> workload & sys metrics carefully. More details on this can be found
> at http://irqbalance.org/
>
> You may find that collectl [http://collectl.sourceforge.net/] comes in
> handy for gathering & monitoring this sys metric (and others!).
> My mantra on collectl is: "If your OS is Linux and you are not using
> collectl, you probably should be." (I'm a big fan.)
>
> Cheers,
>
> On Sun, Oct 9, 2011 at 10:00 AM, Riyaj Shamsudeen
> <riyaj.shamsudeen_at_gmail.com> wrote:
> > Hello Jed
> > NIC cards interrupt CPU for the packet delivery. Of course, in a busy
> RAC
> > database, there can be huge amount of network packets being transferred
> > leading to high IRQs. If IRQs are pinned to be interrupted to one CPU,
> then
> > latency in that CPU can cause issues as kernel threads need to be
> scheduled
> > to serve the irqs only in that CPU.
> > If you want IRQs to be pinned to one CPU, then you should make sure
> that
> > no other process is scheduled to execute in that CPU. But, I see that 40%
> of
> > usage in CPU in USER mode which indicates that this is probably not
> > happening in your case.
> > But, why is this important for you? Do you see network delays causing
> RAC
> > performance issues? If yes, then I don't see an issue of IRQs being
> serviced
> > by all CPUs. Also, I am surprised that this is not a default.
> >
> >
> > On Mon, Oct 3, 2011 at 4:29 PM, Walker, Jed S
> > <Jed_Walker_at_cable.comcast.com>wrote:
> >
> >> Back to my learning of RAC. Today, it was suggested that we turn on
> >> IRQBALANCE on our Oracle 11.2.0 RAC systems to help distribute the IRQ
> load,
> >> to hopefully help with performance. I did a check and can see that just
> one
> >> CPU appears to be handling all of these.
> >> mpstat -P ALL 2
> >> Linux 2.6.18-53.el5 (node-01) 10/03/2011
> >>
> >> 09:19:46 PM CPU %user %nice %sys %iowait %irq %soft %steal
> >> %idle intr/s
> >> 09:19:48 PM all 14.30 0.00 3.04 23.54 0.25 1.27 0.00
> >> 57.59 10903.06
> >> 09:19:48 PM 0 41.33 0.00 9.18 40.31 1.02 4.08 0.00
> >> 4.08 10902.55
> >> 09:19:48 PM 1 2.55 0.00 0.51 14.29 0.00 0.00 0.00
> >> 82.65 0.00
> >> 09:19:48 PM 2 12.24 0.00 2.04 34.18 0.00 0.00 0.00
> >> 52.04 0.00
> >> 09:19:48 PM 3 1.02 0.00 0.51 6.63 0.00 0.00 0.00
> >> 92.35 0.00
> >> (this is consistent over a period of time)
> >>
> >> I then read an article saying that in many cases this doesn't matter -
> >> something to do with processes being pinned to a CPU (Sorry, I can't
> find
> >> the article again!).
> >>
> >> Does anyone have any experience, or is there a good practice for this
> and
> >> RAC?
> >>
> >> service irqbalance start
> >> chkconfig irqbalance on
> >>
>
> --
> Regards,
> Greg Rahn
> http://structureddata.org
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Oct 09 2011 - 20:16:20 CDT

Original text of this message