Re: OOM killer terminating database on AWS EC2

From: <niall.litchfield_at_gmail.com>
Date: Wed, 15 Jan 2020 17:22:31 +0000
Message-ID: <CABe10sZo7eDs72nNF=iErSUVrOYiFsf4TjSWcN_C-g8QsM6dFQ_at_mail.gmail.com>



If you are just getting started with AWS then I'd suggest a few things

  1. See if your company can't spring for some AWS training via (as examples A Cloud Guru or LinkedIn).
  2. Get hold of some AWS credentials and the AWS CLI https://aws.amazon.com/cli/ Then you can do things like aws ec2 describe-instances --profile personal --region eu-west-2

   and get useful information (in not so useful formats for humans:( ) back    like the extract following which tells you things like core count, ami,    instance types and so on.
   3. The AMI chosen is pretty important for your use case since there will    be defaults for things like vm.swappiness etc that are baked in and may, or    may not be appropriate for Oracle.
   4. As Andrew says the use of Cloud doesn't mean that you are absolved    from ensuring that the configuration matches the vm configuration.    Knowing your instance types is important here (I tend to suggest M5 or R5    for Oracle but others might have different views).

Hope that helps a bit and good luck

example aws cli output follows - I cut a lot.

 "Instances": [

                {

"ImageId": "ami-0fa519aa0550762de",
"InstanceType": "t2.micro",
"Monitoring": {
"State": "disabled" },
"Placement": {
"AvailabilityZone": "eu-west-2a", },
"PublicDnsName": "
ec2-35-178-116-210.eu-west-2.compute.amazonaws.com",
"PublicIpAddress": "35.178.116.210",
"BlockDeviceMappings": [
{ "DeviceName": "/dev/sda1", "Ebs": { "AttachTime": "2020-01-15T16:41:12.000Z", "DeleteOnTermination": true, } } ],
"NetworkInterfaces": [
{ "Association": { "IpOwnerId": "amazon", "PublicDnsName": " ec2-35-178-116-210.eu-west-2.compute.amazonaws.com", "PublicIp": "35.178.116.210" }, "Attachment": {
"RootDeviceName": "/dev/sda1",
"RootDeviceType": "ebs",
"SourceDestCheck": true,
"VirtualizationType": "hvm",
"CpuOptions": {
"CoreCount": 1, "ThreadsPerCore": 1 },

On Tue, Jan 14, 2020 at 8:42 PM Sandra Becker <sbecker6925_at_gmail.com> wrote:

> Thanks, Andrew.  Swap did seem small to me at first glance.  Glad to have
> that confirmation.  Will take a closer look at the memory parameters as
> well.
>
>
>
> On Tue, Jan 14, 2020 at 1:13 PM Andrew Kerber <andrew.kerber_at_gmail.com>
> wrote:
>
>> By my count you are definitely short on memory, 100G to sga. 32G to PGA,
>> and 124g on the server.  Your swap is also underconfigured, it should be
>> 16g.
>>
>> On Tue, Jan 14, 2020 at 2:02 PM Sandra Becker <sbecker6925_at_gmail.com>
>> wrote:
>>
>>> Mark,
>>>
>>> It's been a busy morning, starting with the page just before midnight.
>>> Spinnaker called me and got my account activated, so I've opened a ticket
>>> with them.  We'll see what they come back with.  I wish I had better
>>> answers to your questions.  I'm pretty new to AWS and had no training to
>>> speak of.  Having to cover for the DBA that left 2 weeks ago and being
>>> ignorant about the configuration is not a happy place.
>>>
>>> 1.  I don't know how storage was configured and not sure how to tell if
>>> it's instance store volumes. Yes, we have swap configured.  Lot's of free
>>> swap, so don't think that's an issue.
>>> 2.  Yes, we have huge pages configured.  Oracle memory usage - are you
>>> referring to sga/pga?  If yes,  sga_target=96636764160,
>>> sga_max_size=107374182400, pga_aggregate_limit=32212254720,
>>> pga_aggregate_target=32212254720
>>> 3.  free -h:
>>> total        used        free      shared  buff/cache   available
>>> Mem:           124G         19G        1.2G         86G        103G
>>>     17G
>>> Swap:          5.0G        2.1G        2.9G
>>>
>>> top:
>>> top - 19:59:10 up 4 days, 20:10,  2 users,  load average: 0.11, 0.08,
>>> 0.13
>>> Tasks: 385 total,   1 running, 384 sleeping,   0 stopped,   0 zombie
>>> %Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,
>>>  0.0 st
>>> KiB Mem : 13070175+total,  1245316 free, 20508408 used,
>>> 10894803+buff/cache
>>> KiB Swap:  5242876 total,  3013140 free,  2229736 used. 18428996 avail
>>> Mem
>>>
>>>
>>>
>>> Sandy
>>>
>>> On Mon, Jan 13, 2020 at 1:44 PM Mark J. Bobak <mark_at_bobak.net> wrote:
>>>
>>>> Hi Sandy,
>>>>
>>>> I know it's (almost certainly) happening *way* above your level, but
>>>> dropping Oracle support on *any* database, let alone a production database,
>>>> is foolishness, and certainly *not* a cost savings, not in the long run.....
>>>>
>>>> I run Oracle on EC2, w/ mail enabled, and so far, have never run into
>>>> an OOM situation.  The system has to be *really* low on memory for the
>>>> kernel's OOM killer to wake up and start killing stuff.  When it does,
>>>> Oracle is a big target, because it (almost certainly) is (and should be)
>>>> the big memory consumer on your (EC2) instance.
>>>>
>>>> Some questions:
>>>> 1.)  What instance type(s) are you running?  Do you have instance store
>>>> volumes configured for swap?  Do you have swap configured at all?  What is
>>>> the level of swap usage you are seeing?
>>>> 2.)  How is your Oracle memory usage configured?  Do you have hugepages
>>>> configured?  (Please say yes....)
>>>> 3.)  What do the outputs of 'free -h' and 'top' tell you?  How about
>>>> 'vmstat'?  'sar -B'?
>>>>
>>>> -Mark
>>>>
>>>>
>>>> On Mon, Jan 13, 2020 at 2:33 PM Sandra Becker <sbecker6925_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Server:   AWS EC2
>>>>> RHEL:   7.6
>>>>> Oracle:  12.1.0.2
>>>>>
>>>>> We have a database on an AWS EC2 server that the OOM killer has
>>>>> terminated twice in the last 5 days, both times it was the ora_dbw0_dwprod
>>>>> process.  On 1/8 postfix was enabled to allow us to email the DBA team
>>>>> through an AWS relay server when a backup failed.  We stopped running daily
>>>>> backups and cronjobs that did a quick check for expired accounts.  We've
>>>>> left postfix enabled for sending emails.  We are searching for answers but
>>>>> have none yet as to why this is happening.  We also no longer have Oracle
>>>>> support available to us.  (management saving money again).
>>>>>
>>>>> Questions:
>>>>>
>>>>>    1. Could postfix be related to the memory issues even though we
>>>>>    haven't sent any emails since the first crash 5 days ago?
>>>>>    2. How can we monitor the memory usage of  an EC2 instance?
>>>>>    3. How do you disable the OOM killer in EC2 should we decide to go
>>>>>    that route?  (we have it disabled on our on-prem servers)  The docs I've
>>>>>    found so far have not been helpful.
>>>>>
>>>>> I appreciate any help you can give us or pointing us in the right
>>>>> direction.
>>>>>
>>>>> Thank you,
>>>>> --
>>>>> Sandy B.
>>>>>
>>>>>
>>>
>>> --
>>> Sandy B.
>>>
>>>
>>
>> --
>> Andrew W. Kerber
>>
>> 'If at first you dont succeed, dont take up skydiving.'
>>
>
>
> --
> Sandy B.
>
>

-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 15 2020 - 18:22:31 CET

Original text of this message