Re: RMAN innocent bystanders killed on linux

From: Christo Kutrovsky <kutrovsky.oracle_at_gmail.com>
Date: Thu, 28 Feb 2008 22:38:54 +0400
Message-ID: <52a152eb0802281038r578cde54u64c4ef4b3084746e@mail.gmail.com>


Hello,

This is known issue with 2.4 kernels. It's not so much to do with low memory, but incorrect memory counting from the OOM module. It is related with large file io operations, which use a lot of file system cache.

Enable DIRECTIO (filesystem_options=directio). In 2.4 kernel you have either DIRECTIO or ASYNC for ext3 (I am assuming you are using ext3). Not both, if you do "setall" async will take precedence.

Note that this will only help you with your duplicate. If you start a "cp" someone will get killed. I believe there's a bugfix for the 2.4 kernel. Make sure you are using latest 2.4 kernel.

If you really need more info, I can try to lookup the kernel that had this issue, and the kernel that did not.

-- 
Christo Kutrovsky
DBA Team Lead
The Pythian Group - www.pythian.com
I blog at http://www.pythian.com/blogs/


On Wed, Feb 27, 2008 at 9:40 PM, Niall Litchfield
<niall.litchfield_at_gmail.com> wrote:

> did you checkout
> http://www.pythian.com/blogs/741/pythian-goodies-free-memory-swap-oracle-and-everything
> which gives a couple of possibilities that might match the report from free,
> essentially filesystem cache is a likely culprit.
>
> My understanding of OOM though is that it kicks in when LowFree is very low
> (cat /proc/meminfo |grep Low) - for some values of very low!
>
> I'm on a rather steep learning curve with Linux memory management (and being
> uncomfortably reminded of dos/16bt windows in the process) though so treat
> with caution.
>
> Niall
>
>
> On Wed, Feb 27, 2008 at 12:34 PM, Howard Latham <howard.latham_at_rsmb.co.uk>
> wrote:
>
>
> >
> >
> > I am trying to duplicate a 10g database to a new host.
> > Host has 28Gig of memory it is running REDHAT Enterprise.
> > It reads approx 10 X 2 gig backup slices then Redhat's out of memory
> > utility kicks in and kills the new database AND an innocent bystander
> database
> > Both DBs generate a PMON 471 Error.
> > I watched the memory with free and the process did not use up all the
> memory -
> > I also have 28Gig of swap.
> >
> > I have logged a TAR but Oracle have gone rather quiet!
> >
> >
> >
> >
> > Howard A. Latham
> > IT Infrastructure Manager
> > RSMB Television Research Ltd,
> > The Communications Building,
> > 48 Leicester Square,
> > London. WC2H 7LT
> > Registered in England 2173860
> >
> > Registered in England No. 3266277
> >
> > Save a tree...Please don't print this email unless you really need to.
> >
> >
> >
> >
> >
> >
> > Tel: +44 (0)20 7808 3619
> > SW: +44 (0)20 7808 3600
> > Fax: +44 (0)20 7839 7446
> >
> > mailto:Howard.latham_at_rsmb.co.uk
> >
> > http://www.rsmb.co.uk
> >
> >
> >
> >
> >
> > ________________________________
> From: Howard Latham
> > Sent: 26 February 2008 16:36
> > To: 'oracle-l_at_freelists.org'
> > Subject: RMAN
> >
> >
> >
> > I am getting a PMON 471 when duplicating a database is this a bug?
> > its 10g on REDHAT
> >
> >
> > Howard A. Latham
> > IT Infrastructure Manager
> > RSMB Television Research Ltd,
> > The Communications Building,
> > 48 Leicester Square,
> > London. WC2H 7LT
> > Registered in England 2173860
> >
> > Registered in England No. 3266277
> >
> > Save a tree...Please don't print this email unless you really need to.
> >
> >
> >
> >
> >
> >
> > Tel: +44 (0)20 7808 3619
> > SW: +44 (0)20 7808 3600
> > Fax: +44 (0)20 7839 7446
> >
> > mailto:Howard.latham_at_rsmb.co.uk
> >
> > http://www.rsmb.co.uk
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
> --
> Niall Litchfield
> Oracle DBA
> http://www.orawin.info
-- Christo Kutrovsky DBA Team Lead The Pythian Group - www.pythian.com I blog at http://www.pythian.com/blogs/ -- http://www.freelists.org/webpage/oracle-l
Received on Thu Feb 28 2008 - 12:38:54 CST

Original text of this message