RE: Oracle 10g hangs intermittently waiting for I/O

From: Matthew Zito <mzito_at_gridapp.com>
Date: Fri, 15 May 2009 11:41:17 -0400
Message-ID: <C0A5E31718FC064A91E9FD7BE2F081B1020BFCF2_at_exchange.gridapp.com>


Ok, well, then, we'll move to plan B. Do a test like this while everything is working fine:

[root_at_rh45-rac001-01 disks]# time dd if=/dev/oracleasm/disks/BAR of=/tmp/dd.out bs=1024k count=10 10+0 records in
10+0 records out

real    0m0.911s
user    0m0.002s
sys     0m0.146s

[root_at_rh45-rac001-01 disks]#

That'll tell you roughly how fast you should expect to be able to read 10MB off of one of those disks. Do this a few times for the various disks that you think you might have problems with.

Then, when you're having the problems, do the same test. If the dd takes dramatically longer to run, or hangs altogether, then you know it's an OS/storage issue. If everything hums along just as before, then it's probably an Oracle issue.

Thanks,
Matt

-----Original Message-----
From: Paweł Kotlarz [mailto:pkotla_at_go2.pl] Sent: Friday, May 15, 2009 10:43 AM
To: Matthew Zito
Cc: oracle-l_at_freelists.org
Subject: Re: Oracle 10g hangs intermittently waiting for I/O

Hi Matt,

dmesg shows only timeouts on a cdrom drive and reservation conflicts on tape devices. Multipathing is not used.

scsi3 (0,2,0) : reservation conflict
scsi3 (0,2,0) : reservation conflict
ide-cd: cmd 0x1e timed out

hda: irq timeout: status=0xd0 { Busy }
hda: irq timeout: error=0x00
hda: ATAPI reset complete

ide-cd: cmd 0x25 timed out
hda: irq timeout: status=0xd0 { Busy }
hda: irq timeout: error=0x00
hda: ATAPI reset complete


Thanks,

Pawel

On 2009-05-15 15:57, Matthew Zito wrote:
> If you run a "dmesg" - do you see any errors in the kernel logs? If the devices stop responding to I/O for periods of time there should be SCSI timeouts in the logs, or at least some warnings from the multipathing driver.
>
> Thanks,
> Matt
>
> --
> Matthew Zito
> Chief Scientist
> GridApp Systems
> P: 646-452-4090
> mzito_at_gridapp.com
> http://www.gridapp.com
>
>
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org on behalf of Pawel Kotlarz
> Sent: Fri 5/15/2009 9:48 AM
> To: oracle-l_at_freelists.org
> Subject: Oracle 10g hangs intermittently waiting for I/O
>
> Hello all.
>
> I have oracle 10.2.0.3 data warehouse database on 11.1.0.7 ASM with
> asmlib. RHEL 4.7. Proliant DL585 G2 with MSA70 storage.
>
> The problem I face is an 'I/O hiccup'. The database can work properly
> for a week or two and then suddenly keep stalling for no apparent
> reason. Users complain that their selects take 2x or 3x more time.
> vmstat shows I/O activity (bi, bo colums) for half a minute and for
> another half a minute shows no activity (bi and bo columns equal to 0)
> and a number of processes waiting for I/O (procs/b column). strace on an
> oracle process waiting for I/O shows it is waiting for a completion of
> 'read' call. The only thing that helps is rebooting the box.
>
> I can isolate the problem to specific disks using iostat. These disks
> are the same on a day the problem occurs but they are different on
> another occurrance of the problem. Storage / Linux admins do not see any
> problem on their side.
>
> I have several one-off patches recommended by Oracle support:
>
> Bug 5452672: Hung database instance if linux kernel miss aio request
> Bug 6656824: LNX-10204-TC6 SIGSEGV AT SKGFR_REAP64()+281, IN DBW0
> Bug 6087207: WARNING:ORACLE PROCESS RUNNING OUT OF OS KERNEL I/O RESOURCES
> Bug 6882513 - MERGE LABEL REQUEST ON TOP OF 10.2.0.3 FOR BUGS 6801535
> 5576584
> Bug 5576584 (4880399): ASM PARALLEL READS PERFORMANCE NOT ACCEPTABLE
>
> I plan to upgrade to 10.2.0.4 but need first to sort out some hash join
> bugs (yet unknown to Oracle) that break our large queries with ora-600
> errors.
>
> What would you recommend to do to narrow down the problem to Oracle /
> ASM / asmlib / Linux / storage fault?
>
> Do you know of any other bugs that can show such a behaviour?
>
> Thanks.
>
>
> Pawel Kotlarz
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri May 15 2009 - 10:41:17 CDT

Original text of this message