Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: 99% IOWAIT with Oracle RAC 10g (10.1.0.4) on Linux
After a few days of fault finding I've managed to get to the root of
the problem (thanks to the help of Jermey and Daniel):
The /var/log/messages file had the following entries in it every time the IOWAIT went through the roof:
Jun 13 12:20:30 linux1 kernel: ieee1394: sbp2: aborting sbp2 command Jun 13 12:20:30 linux1 kernel: Read (10) 00 05 e7 d2 80 00 00 05 00 Jun 13 12:20:30 linux1 kernel: ieee1394: sbp2: aborting sbp2 command Jun 13 12:20:30 linux1 kernel: Read (10) 00 00 15 9a 60 00 00 05 00
So the problem appeared to be either in the sbp2 driver or the hard drive itself. The hard drive has the Oxford 911 chipset so my investigation centered around the sbp2 driver.
A good dig around google for the abort messages above lead me to an optional parameter for loading the sbp2 module.
sbp2_serialize_io
By adding the following line into the /etc/modules.conf and rebooting each node, I have solved the problem.
options sbp2 sbp2_serialize_io=1
This option is generally used to workaround bugs in the sbp2 driver, or for debugging purposes so I suspect that it may be slower than the default setting. But for my purposes the stability is the major priority.
Thanks to everyone who contributed to the thread....
BTW - to confirm whether this option is effective check for the following string in the 'dmesg' output:
ieee1394: sbp2: Driver forced to serialize I/O (serialize_io = 1)
Cheers
Matt Received on Tue Jun 14 2005 - 10:29:28 CDT