Re: RAC interconnect packet size??
Date: Wed, 22 Apr 2009 22:34:42 -0700
Hi Mark (and Joe!),
In the "wait and see" mode, you might want to track the "fragments dropped after timeout" and "packet reassemblies failed" from "netstat -s" (Partial list on a Linux box below). Assuming you did restart the Servers after implementing Jumbo Frames, you should have a very small percentage from these two stats in comparison to the total number of packets. (Unfortunately, there isn't an equivalent to the snapshots of perf data from netstat unless you code that with a shell script):
$ netstat -s
1515397615 total packets received
21 with unknown protocol
0 incoming packets discarded
1515384318 incoming packets delivered 1960954465 requests sent out
8 fragments dropped after timeout
26185 reassemblies required
13057 packets reassembled ok
8 packet reassembles failed
13116 fragments received ok
And of course, you should track %sys time in case you were collecting/storing sar stats.
> I hear what you're saying, but, because the LMS processes were by far the
biggest CPU hogs, I was thinking that the overhead of breaking down and reassembling packets was the primary cause of CPU starvation.
> As I said, we're currently in "wait and see" mode, hoping that we've seen
the last of these events. Obviously, if I see more CPU starvation, I'll have to re-think the root cause. But, as I mentioned before, enabling jumbo frames is the "right" thing to do, and there's really no downside, so....
Also, keep in mind that interconnect traffic will consist of both data blocks (larger ones that would have required reassembly depending on MTU size) as well as smaller (~200 bytes?) messages. You should be able to see this from AWR stats
Global Cache Load Profile
~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Global Cache blocks received: 259.93 2.78 Global Cache blocks served: 1,084.36 11.58 GCS/GES messages received: 8,040.38 85.88 GCS/GES messages sent: 3,771.97 40.29 DBWR Fusion writes: 6.40 0.07 Estd Interconnect traffic (KB) 13,061.40
As well, you should also track the "Global Cache and Enqueue Services - Workload Characteristics" and "Global Cache and Enqueue Services - Messaging Statistics" sections as well in AWR. If you have AWR data from before the change, that *may* show you if you did improve and by how much....
Would appreciate your posting any stats and observations you find...
-- John Kanagaraj <>< http://www.linkedin.com/in/johnkanagaraj http://jkanagaraj.wordpress.com (Sorry - not an Oracle blog!) ** The opinions and facts contained in this message are entirely mine and do not reflect those of my employer or customers ** -- http://www.freelists.org/webpage/oracle-lReceived on Thu Apr 23 2009 - 00:34:42 CDT