RAC 'gc blocks lost' in stress test

From: Yong Huang <yong321_at_yahoo.com>
Date: Mon, 17 Mar 2008 12:04:55 -0700 (PDT)
Message-ID: <540914.53393.qm@web80615.mail.mud.yahoo.com>


We're using Swingbench to stress test an 8-node 10.2.0.3 RAC running on RH Linux (2.6.9-55.ELsmp x86_64). The total non-idle active sessions is between 200 and 300 (select count(*) from gv$session where status = 'ACTIVE' and wait_class != 'Idle'). At this stress level, we start to see 'gc blocks lost' statistic going up, sometimes as fast as 10 in 2 hours. gv$cr_block_server.fail_results also goes up.

Here's the corresponding interconnect interface (bonding with two e1000 slaves) on the worst node:

# ifconfig bond1

bond1     Link encap:Ethernet  HWaddr 00:1B:78:58:F1:12
          inet addr:192.168.2.16  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:78ff:fe58:f112/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:190455470 errors:0 dropped:146 overruns:0 frame:0
          TX packets:199397251 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:135262879485 (125.9 GiB)  TX bytes:147167839800 (137.0 GiB)

Note the 146 dropped packets. We already enabled RX flow control (Note:400959.1). We tried setting txqueuelen to 1000, tried enabling TX flow control. Nothing helped. /etc/sysctl.conf has:
...

net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 1048576
net.core.wmem_max = 1048576

...

I'm guessing further increasing those values just delays the onset of the problem.

I posted a message to

http://groups.google.com/group/comp.os.linux.networking/browse_frm/thread/8317f37d0471d92d

It's possible that the ratio of 146/190455470 is low enough for us to not worry. But in case anybody has any suggestion, I'm posting this message here. Thanks.

Yong Huang



Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
--
http://www.freelists.org/webpage/oracle-l
Received on Mon Mar 17 2008 - 14:04:55 CDT

Original text of this message