RE: RAC 'gc blocks lost' in stress test
Date: Mon, 17 Mar 2008 18:59:08 -0400
Message-ID: <56211FD5795F8346A0719FEBC0DB0675020335FD@mds3aex08.USIEXCHANGE.COM>
Although you don't have that many dropped blocks, ideally you should be at zero. We have RAC systems that can run for a month with zero dropped packets. Here is a quick example of a heavily used system that has only been running for a few days, and reports zero errors.
bond0 Link encap:Ethernet HWaddr 00:17:08:7D:B6:54 inet addr:10.193.4.7 Bcast:10.193.4.31 Mask:255.255.255.224 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:464186898 errors:0 dropped:0 overruns:0 frame:0 TX packets:510848771 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:472399237908 (439.9 GiB) TX bytes:532685382437
(496.1 GiB)
Ths would be on Intel-based HP systems, gigabit enet, bonded, and connected to Cisco 3750 switches.
I would say that if the problem does not grow, you should be ok. If it grows, then have your network or platform people do a QA on the network hardware and configuration. Also, just for my info, do you run hugepages ?
-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Yong Huang
Sent: Monday, March 17, 2008 3:05 PM
To: oracle-l_at_freelists.org
Subject: RAC 'gc blocks lost' in stress test
We're using Swingbench to stress test an 8-node 10.2.0.3 RAC running on
RH
Linux (2.6.9-55.ELsmp x86_64). The total non-idle active sessions is
between
200 and 300 (select count(*) from gv$session where status = 'ACTIVE' and
wait_class != 'Idle'). At this stress level, we start to see 'gc blocks
lost'
statistic going up, sometimes as fast as 10 in 2 hours.
gv$cr_block_server.fail_results also goes up.
Here's the corresponding interconnect interface (bonding with two e1000
slaves)
on the worst node:
# ifconfig bond1
bond1 Link encap:Ethernet HWaddr 00:1B:78:58:F1:12 inet addr:192.168.2.16 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21b:78ff:fe58:f112/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:190455470 errors:0 dropped:146 overruns:0 frame:0 TX packets:199397251 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:135262879485 (125.9 GiB) TX bytes:147167839800
(137.0 GiB)
Note the 146 dropped packets. We already enabled RX flow control
(Note:400959.1). We tried setting txqueuelen to 1000, tried enabling TX
flow
control. Nothing helped. /etc/sysctl.conf has:
...
net.core.rmem_default = 1048576 net.core.rmem_max = 1048576 net.core.wmem_default = 1048576 net.core.wmem_max = 1048576
...
I'm guessing further increasing those values just delays the onset of the
problem.
I posted a message to
http://groups.google.com/group/comp.os.linux.networking/browse_frm/threa d/8317f37d0471d92d
It's possible that the ratio of 146/190455470 is low enough for us to
not
worry. But in case anybody has any suggestion, I'm posting this message
here.
Thanks.
Yong Huang
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-- http://www.freelists.org/webpage/oracle-l -- http://www.freelists.org/webpage/oracle-lReceived on Mon Mar 17 2008 - 17:59:08 CDT