Re: RAC node "has a disk HB, but no network HB" but traceroute

From: Yong Huang <"Yong>
Date: Thu, 5 Jan 2017 22:08:40 +0000 (UTC)
Message-ID: <748075613.1140263.1483654120940_at_mail.yahoo.com>


Thanks, Justin, Jure and Martin. Martin's article is great. Interpreting "no network HB" as "there are 2 or more processes which missed to communicate" instead of a network problem is the key. That's exactly what I meant in the SR I opened by saying "We begin to doubt about the meaning of the "no network HB" message". So far the SR hasn't gone anywhere after uploading various types of logs.

Our log does show fast increase in IP packets that need reassembly and all these reassemblies failed: $ egrep '^zzz|reassembl' <OSWatcher netstat log> ...
zzz Sun Dec 18 02:01:58 CST 2016
555539624 reassemblies required
100653307 packets reassembled ok
60026 packet reassembles failed
zzz Sun Dec 18 02:02:28 CST 2016
555545702 reassemblies required
100653307 packets reassembled ok
66103 packet reassembles failed
zzz Sun Dec 18 02:02:58 CST 2016
555551748 reassemblies required
100653307 packets reassembled ok
72149 packet reassembles failed

Of all the documents I found, Red Hat "IP fragmentation fails and fragmented packets get dropped" at https://access.redhat.com/solutions/1498603 is a good one. But you have to login to read it. In short, if I understand the confusing Root Cause section correctly, kernel-2.6.32-477.el6 or RHEL6.6 has a bug that incorrectly calculates IP fragmentation memory, which causes false evictions (i.e. drop) of IP fragments on systems with many CPUs. (Our problem server has 80 CPUs. Other servers have much less.) Upgrade of the kernel or Red Hat release version is the solution. An easy workaround is to increase the fragmentation buffer size. The article says doubling the fragmentation thresholds is enough, i.e. from the default 4M to 8M. We'll set the IP fragmentation buffer low and high values to 15 and 16 MB per Oracle note 2008933.1. I think the counter "fragments dropped after timeout" in `netstat -s' is related to /proc/sys/net/ipv4/ipfrag_time and ours seems to be fairly stable even before the crash, I'll leave that parameter alone for now.

Now I think I know why our OSWatcher did not report a traceroute problem at the last crash: the default packet size used by traceroute is only 60 bytes. To detect the problem, we should append a packet length parameter to the traceroute command with a value greater than 1500, the Ethernet MTU.

Yong Huang

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jan 05 2017 - 23:08:40 CET

Original text of this message