Re: OOM killer terminating database on AWS EC2

From: Fernando Andrade <correo_at_fjandrade.com>
Date: Mon, 13 Jan 2020 17:03:49 -0500
Message-ID: <2fd0e827-3a8c-648d-b977-0816f326d32c_at_fjandrade.com>



Hi Sandy

In AWS you can use SES for sending the emails, also use cloudwatch to monitor at a process level.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-procstat-process-metrics.html#CloudWatch-Agent-procstat-configuration

FJA On 1/13/2020 3:44 PM, Mark J. Bobak wrote:
> Hi Sandy,
>
> I know it's (almost certainly) happening *way* above your level, but
> dropping Oracle support on *any* database, let alone a production
> database, is foolishness, and certainly *not* a cost savings, not in
> the long run.....
>
> I run Oracle on EC2, w/ mail enabled, and so far, have never run into
> an OOM situation.  The system has to be *really* low on memory for the
> kernel's OOM killer to wake up and start killing stuff.  When it does,
> Oracle is a big target, because it (almost certainly) is (and should
> be) the big memory consumer on your (EC2) instance.
>
> Some questions:
> 1.)  What instance type(s) are you running?  Do you have instance
> store volumes configured for swap?  Do you have swap configured at
> all?  What is the level of swap usage you are seeing?
> 2.)  How is your Oracle memory usage configured?  Do you have
> hugepages configured?  (Please say yes....)
> 3.)  What do the outputs of 'free -h' and 'top' tell you? How about
> 'vmstat'?  'sar -B'?
>
> -Mark
>
>
> On Mon, Jan 13, 2020 at 2:33 PM Sandra Becker <sbecker6925_at_gmail.com
> <mailto:sbecker6925_at_gmail.com>> wrote:
>
> Server:   AWS EC2
> RHEL:   7.6
> Oracle:  12.1.0.2
>
> We have a database on an AWS EC2 server that the OOM killer has
> terminated twice in the last 5 days, both times it was the
> ora_dbw0_dwprod process.  On 1/8 postfix was enabled to allow us
> to email the DBA team through an AWS relay server when a backup
> failed.  We stopped running daily backups and cronjobs that did a
> quick check for expired accounts.  We've left postfix enabled for
> sending emails.  We are searching for answers but have none yet as
> to why this is happening.  We also no longer have Oracle support
> available to us.  (management saving money again).
>
> Questions:
>
> 1. Could postfix be related to the memory issues even though we
> haven't sent any emails since the first crash 5 days ago?
> 2. How can we monitor the memory usage of  an EC2 instance?
> 3. How do you disable the OOM killer in EC2 should we decide to
> go that route?  (we have it disabled on our on-prem servers) 
> The docs I've found so far have not been helpful.
>
> I appreciate any help you can give us or pointing us in the right
> direction.
>
> Thank you,
> --
> Sandy B.
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jan 13 2020 - 23:03:49 CET

Original text of this message