RE: RAC 'gc blocks lost' in stress test

From: Crisler, Jon <Jon.Crisler_at_usi.com>
Date: Mon, 17 Mar 2008 18:59:08 -0400
Message-ID: <56211FD5795F8346A0719FEBC0DB0675020335FD@mds3aex08.USIEXCHANGE.COM>

Although you don't have that many dropped blocks, ideally you should be at zero. We have RAC systems that can run for a month with zero dropped packets. Here is a quick example of a heavily used system that has only been running for a few days, and reports zero errors.

bond0     Link encap:Ethernet  HWaddr 00:17:08:7D:B6:54  
          inet addr:10.193.4.7  Bcast:10.193.4.31  Mask:255.255.255.224
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:464186898 errors:0 dropped:0 overruns:0 frame:0
          TX packets:510848771 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:472399237908 (439.9 GiB)  TX bytes:532685382437

(496.1 GiB)

Ths would be on Intel-based HP systems, gigabit enet, bonded, and connected to Cisco 3750 switches.

I would say that if the problem does not grow, you should be ok. If it grows, then have your network or platform people do a QA on the network hardware and configuration. Also, just for my info, do you run hugepages ?

-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Yong Huang Sent: Monday, March 17, 2008 3:05 PM
To: oracle-l_at_freelists.org
Subject: RAC 'gc blocks lost' in stress test

We're using Swingbench to stress test an 8-node 10.2.0.3 RAC running on RH
Linux (2.6.9-55.ELsmp x86_64). The total non-idle active sessions is between
200 and 300 (select count(*) from gv$session where status = 'ACTIVE' and wait_class != 'Idle'). At this stress level, we start to see 'gc blocks lost'
statistic going up, sometimes as fast as 10 in 2 hours. gv$cr_block_server.fail_results also goes up.

Here's the corresponding interconnect interface (bonding with two e1000 slaves)
on the worst node:

# ifconfig bond1

bond1     Link encap:Ethernet  HWaddr 00:1B:78:58:F1:12
          inet addr:192.168.2.16  Bcast:192.168.2.255
Mask:255.255.255.0
          inet6 addr: fe80::21b:78ff:fe58:f112/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:190455470 errors:0 dropped:146 overruns:0 frame:0
          TX packets:199397251 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:135262879485 (125.9 GiB)  TX bytes:147167839800

(137.0 GiB)

Note the 146 dropped packets. We already enabled RX flow control
(Note:400959.1). We tried setting txqueuelen to 1000, tried enabling TX
flow
control. Nothing helped. /etc/sysctl.conf has: ...

net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 1048576
net.core.wmem_max = 1048576

...
I'm guessing further increasing those values just delays the onset of the
problem.

I posted a message to

http://groups.google.com/group/comp.os.linux.networking/browse_frm/threa d/8317f37d0471d92d

It's possible that the ratio of 146/190455470 is low enough for us to not
worry. But in case anybody has any suggestion, I'm posting this message here.
Thanks.

Yong Huang  




Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Mon Mar 17 2008 - 17:59:08 CDT

Original text of this message