RAC 'gc blocks lost' in stress test
Date: Mon, 17 Mar 2008 12:04:55 -0700 (PDT)
Message-ID: <540914.53393.qm@web80615.mail.mud.yahoo.com>
We're using Swingbench to stress test an 8-node 10.2.0.3 RAC running on RH
Linux (2.6.9-55.ELsmp x86_64). The total non-idle active sessions is between
200 and 300 (select count(*) from gv$session where status = 'ACTIVE' and
wait_class != 'Idle'). At this stress level, we start to see 'gc blocks lost'
statistic going up, sometimes as fast as 10 in 2 hours.
gv$cr_block_server.fail_results also goes up.
Here's the corresponding interconnect interface (bonding with two e1000 slaves) on the worst node:
# ifconfig bond1
bond1 Link encap:Ethernet HWaddr 00:1B:78:58:F1:12 inet addr:192.168.2.16 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21b:78ff:fe58:f112/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:190455470 errors:0 dropped:146 overruns:0 frame:0 TX packets:199397251 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:135262879485 (125.9 GiB) TX bytes:147167839800 (137.0 GiB)
Note the 146 dropped packets. We already enabled RX flow control
(Note:400959.1). We tried setting txqueuelen to 1000, tried enabling TX flow
control. Nothing helped. /etc/sysctl.conf has:
...
net.core.rmem_default = 1048576 net.core.rmem_max = 1048576 net.core.wmem_default = 1048576 net.core.wmem_max = 1048576
...
I'm guessing further increasing those values just delays the onset of the problem.
I posted a message to
http://groups.google.com/group/comp.os.linux.networking/browse_frm/thread/8317f37d0471d92d
It's possible that the ratio of 146/190455470 is low enough for us to not worry. But in case anybody has any suggestion, I'm posting this message here. Thanks.
Yong Huang
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Mar 17 2008 - 14:04:55 CDT