slow, steady increase in RX-DROP on interconnect interfaces

From: Chris Stephens <cstephens16_at_gmail.com>
Date: Mon, 30 Jul 2018 09:44:15 -0500
Message-ID: <CAEFL0szOLdVWmpOytDbQoNNqwJ+WHn90ZN_x7ZK=+K7xop+=2A_at_mail.gmail.com>



Some time ago I noticed a slow, steady increase in dropped packets on private network interface on each node of our 3-node 12.2 RAC database running on CentOS 7. ~25-30 dropped packets / minute.

I originally suspected a hardware issue. We've had issues with this particular switch model in the past but our network engineer has not been able to find anything that would indicate switch as source of issue. I don't think its an issue with network cards or cables because it's happening on all 3 nodes.

We looked into some of the buffers involved in receiving packets by card and passing them up through to user space. I wasn't hopeful since dropped packet counts continue to increase when system is idle. while i learned a lot about linux networking, we weren't able to resolve the issue.

"ethtool -S" shows dropped packets which [i think] indicates problems at
the lowest levels of the stack but I'm out of ideas on where to go from here.

When I shut down the only database in the cluster, dropped packets still increase.

When I stop the whole cluster, dropped packets still increase. However, I did notice orarootagent.bin and oraagent.bin are still running after
"crsctl stop cluster -all" which I haven't noticed in the past.

There were some EM13c agents running but after stopping those, dropped packets still increase.

Does anyone have any suggestions that might lead to figuring out what packets are being dropped?

Thanks, as always, for any insights.

Chris

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 30 2018 - 16:44:15 CEST

Original text of this message