Re: RMAN innocent bystanders killed on linux
Date: Thu, 28 Feb 2008 15:03:19 -0500
Given the experience we have had recently, I am not 100% sure if this
issue is merely confined to 2.4
kernels. Just to share our recent experience...
Few weeks back we were facing instance crashes on a rac cluster
(10.2.0.3, linux 2.6.9-184.108.40.206.1.ELsmp)
encountered only during the rman runtime window and subsequent
troubleshooting / research led to reducing
the parallelism / filesperset for the rman configuration. That has so
far avoided the zero memory/swap
scenario we saw in some oracle trace files and we haven't had any instance crashes during rman backup
window since then. Although, o.s. utilities had continued to show a relatively "normal" system from a
memory /swap stand point during those problematic rman backup window times. So, given what we have
seen, I would agree w/Christo that it is an issue associated with large/heavy i/o operations/filesystem cache.
On Thu, Feb 28, 2008 at 1:38 PM, Christo Kutrovsky
> This is known issue with 2.4 kernels. It's not so much to do with low
> memory, but incorrect memory counting from the OOM module.
> It is related with large file io operations, which use a lot of file
> system cache.
> Enable DIRECTIO (filesystem_options=directio). In 2.4 kernel you have
> either DIRECTIO or ASYNC for ext3 (I am assuming you are using ext3).
> Not both, if you do "setall" async will take precedence.
> Note that this will only help you with your duplicate. If you start a
> "cp" someone will get killed. I believe there's a bugfix for the 2.4
> kernel. Make sure you are using latest 2.4 kernel.
> If you really need more info, I can try to lookup the kernel that had
> this issue, and the kernel that did not.
> Christo Kutrovsky
> DBA Team Lead
> The Pythian Group - www.pythian.com
> I blog at http://www.pythian.com/blogs/