Re: RAC interconnect packet size??

From: John Kanagaraj <john.kanagaraj_at_gmail.com>
Date: Wed, 22 Apr 2009 22:34:42 -0700
Message-ID: <2ead3a60904222234v2b574659r83c1884cb93667e1_at_mail.gmail.com>



Hi Mark (and Joe!),

In the "wait and see" mode, you might want to track the "fragments dropped after timeout" and "packet reassemblies failed" from "netstat -s" (Partial list on a Linux box below). Assuming you did restart the Servers after implementing Jumbo Frames, you should have a very small percentage from these two stats in comparison to the total number of packets. (Unfortunately, there isn't an equivalent to the snapshots of perf data from netstat unless you code that with a shell script):

$ netstat -s
Ip:

   1515397615 total packets received
   0 forwarded
   21 with unknown protocol
   0 incoming packets discarded
   1515384318 incoming packets delivered    1960954465 requests sent out
   8 fragments dropped after timeout
   26185 reassemblies required
   13057 packets reassembled ok
   8 packet reassembles failed
   13116 fragments received ok

And of course, you should track %sys time in case you were collecting/storing sar stats.

> I hear what you're saying, but, because the LMS processes were by far the
biggest CPU hogs, I was thinking that the overhead of breaking down and reassembling packets was the primary cause of CPU starvation.
>
> As I said, we're currently in "wait and see" mode, hoping that we've seen
the last of these events. Obviously, if I see more CPU starvation, I'll have to re-think the root cause. But, as I mentioned before, enabling jumbo frames is the "right" thing to do, and there's really no downside, so....
>

Also, keep in mind that interconnect traffic will consist of both data blocks (larger ones that would have required reassembly depending on MTU size) as well as smaller (~200 bytes?) messages. You should be able to see this from AWR stats

Global Cache Load Profile

~~~~~~~~~~~~~~~~~~~~~~~~~                  Per Second       Per Transaction
                                     ---------------       ---------------
 Global Cache blocks received:                259.93                  2.78
   Global Cache blocks served:              1,084.36                 11.58
    GCS/GES messages received:              8,040.38                 85.88
        GCS/GES messages sent:              3,771.97                 40.29
           DBWR Fusion writes:                  6.40                  0.07
Estd Interconnect traffic (KB)             13,061.40

As well, you should also track the "Global Cache and Enqueue Services - Workload Characteristics" and "Global Cache and Enqueue Services - Messaging Statistics" sections as well in AWR. If you have AWR data from before the change, that *may* show you if you did improve and by how much....

Would appreciate your posting any stats and observations you find...

-- 
John Kanagaraj <><
http://www.linkedin.com/in/johnkanagaraj
http://jkanagaraj.wordpress.com (Sorry - not an Oracle blog!)
** The opinions and facts contained in this message are entirely mine and do
not reflect those of my employer or customers **

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Apr 23 2009 - 00:34:42 CDT

Original text of this message