Re: OOM killer terminating database on AWS EC2

From: Sandra Becker <sbecker6925_at_gmail.com>
Date: Tue, 14 Jan 2020 13:40:02 -0700
Message-ID: <CAJzM94AwmtRffyju8-=m4VDXZb+5++fmy283OjXqDxCtT3Yjpg_at_mail.gmail.com>



Thanks, Andrew. Swap did seem small to me at first glance. Glad to have that confirmation. Will take a closer look at the memory parameters as well.

On Tue, Jan 14, 2020 at 1:13 PM Andrew Kerber <andrew.kerber_at_gmail.com> wrote:

> By my count you are definitely short on memory, 100G to sga. 32G to PGA,
> and 124g on the server. Your swap is also underconfigured, it should be
> 16g.
>
> On Tue, Jan 14, 2020 at 2:02 PM Sandra Becker <sbecker6925_at_gmail.com>
> wrote:
>
>> Mark,
>>
>> It's been a busy morning, starting with the page just before midnight.
>> Spinnaker called me and got my account activated, so I've opened a ticket
>> with them. We'll see what they come back with. I wish I had better
>> answers to your questions. I'm pretty new to AWS and had no training to
>> speak of. Having to cover for the DBA that left 2 weeks ago and being
>> ignorant about the configuration is not a happy place.
>>
>> 1. I don't know how storage was configured and not sure how to tell if
>> it's instance store volumes. Yes, we have swap configured. Lot's of free
>> swap, so don't think that's an issue.
>> 2. Yes, we have huge pages configured. Oracle memory usage - are you
>> referring to sga/pga? If yes, sga_target=96636764160,
>> sga_max_size=107374182400, pga_aggregate_limit=32212254720,
>> pga_aggregate_target=32212254720
>> 3. free -h:
>> total used free shared buff/cache available
>> Mem: 124G 19G 1.2G 86G 103G
>> 17G
>> Swap: 5.0G 2.1G 2.9G
>>
>> top:
>> top - 19:59:10 up 4 days, 20:10, 2 users, load average: 0.11, 0.08, 0.13
>> Tasks: 385 total, 1 running, 384 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si,
>> 0.0 st
>> KiB Mem : 13070175+total, 1245316 free, 20508408 used,
>> 10894803+buff/cache
>> KiB Swap: 5242876 total, 3013140 free, 2229736 used. 18428996 avail Mem
>>
>>
>>
>> Sandy
>>
>> On Mon, Jan 13, 2020 at 1:44 PM Mark J. Bobak <mark_at_bobak.net> wrote:
>>
>>> Hi Sandy,
>>>
>>> I know it's (almost certainly) happening *way* above your level, but
>>> dropping Oracle support on *any* database, let alone a production database,
>>> is foolishness, and certainly *not* a cost savings, not in the long run.....
>>>
>>> I run Oracle on EC2, w/ mail enabled, and so far, have never run into an
>>> OOM situation. The system has to be *really* low on memory for the
>>> kernel's OOM killer to wake up and start killing stuff. When it does,
>>> Oracle is a big target, because it (almost certainly) is (and should be)
>>> the big memory consumer on your (EC2) instance.
>>>
>>> Some questions:
>>> 1.) What instance type(s) are you running? Do you have instance store
>>> volumes configured for swap? Do you have swap configured at all? What is
>>> the level of swap usage you are seeing?
>>> 2.) How is your Oracle memory usage configured? Do you have hugepages
>>> configured? (Please say yes....)
>>> 3.) What do the outputs of 'free -h' and 'top' tell you? How about
>>> 'vmstat'? 'sar -B'?
>>>
>>> -Mark
>>>
>>>
>>> On Mon, Jan 13, 2020 at 2:33 PM Sandra Becker <sbecker6925_at_gmail.com>
>>> wrote:
>>>
>>>> Server: AWS EC2
>>>> RHEL: 7.6
>>>> Oracle: 12.1.0.2
>>>>
>>>> We have a database on an AWS EC2 server that the OOM killer has
>>>> terminated twice in the last 5 days, both times it was the ora_dbw0_dwprod
>>>> process. On 1/8 postfix was enabled to allow us to email the DBA team
>>>> through an AWS relay server when a backup failed. We stopped running daily
>>>> backups and cronjobs that did a quick check for expired accounts. We've
>>>> left postfix enabled for sending emails. We are searching for answers but
>>>> have none yet as to why this is happening. We also no longer have Oracle
>>>> support available to us. (management saving money again).
>>>>
>>>> Questions:
>>>>
>>>> 1. Could postfix be related to the memory issues even though we
>>>> haven't sent any emails since the first crash 5 days ago?
>>>> 2. How can we monitor the memory usage of an EC2 instance?
>>>> 3. How do you disable the OOM killer in EC2 should we decide to go
>>>> that route? (we have it disabled on our on-prem servers) The docs I've
>>>> found so far have not been helpful.
>>>>
>>>> I appreciate any help you can give us or pointing us in the right
>>>> direction.
>>>>
>>>> Thank you,
>>>> --
>>>> Sandy B.
>>>>
>>>>
>>
>> --
>> Sandy B.
>>
>>
>
> --
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>

-- 
Sandy B.

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 14 2020 - 21:40:02 CET

Original text of this message