RE: How do you detect memory issues ?

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 12 Dec 2018 16:07:23 -0500
Message-ID: <018901d4925e$c6027d20$52077760$_at_rsiz.com>

looks pretty good to me.

A red flag to me is HEAVY page ins continuously (as per your explanation once in a while to warm start an instance might be heavy, but not continuous.

The last time this was important on anything I was tuning was ages ago when memory was small compared to need and folks were constantly pushing to pre-allocate everything they had and then something trivial like buffering output files to the filesystem caused pageouts of something that then had to be paged in again for a program to run in a chronic tight circle.

I’m very glad that is distant in my rear view mirror. Might have been UNIX instead of LcopiedUNIX, but it is the same issue.

It still can be real, it just no longer seems to reach me. Prevention is probably cheaper than diagnosis, even if you waste a bit, er few gigabytes:

Have more memory than you need – it is cheap, especially compared to making an Oracle licensed CPU wait
Allocate a couple granules more to the sqlshared pool than ADDM tells you you need
See two regarding buffer cache
I’m not an expert sizing inmemory, but holy cow I don’t think you want that even pagable.
Don’t allocate the entire set of memory you have to hugepages that can’t be used for anything else.

mwf

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Martin Berger Sent: Wednesday, December 12, 2018 2:21 PM To: Kyle Haily
Cc: Oracle-L oracle-l
Subject: Re: How do you detect memory issues ?

Hi Kyle,

I'm not a pro regarding linux memory management, but we have to discuss this question with our application OPS teams regularly.

My explanations might not be 100% correct, but it should be sufficient to get a good picture (and why it's so complicated to answer your question)

There is a simple fact why on a linux system, after some time "free" will go down to zero: Linux tries to use all memory as good as possible. IF it is not needed for anything else, filesystem cache is a good use. That is not a problem at all: if memory is needed, some FS caches are used for something "better".

Paging is also nothing bad by its own:
To stay in oracle notation. Imagine a new instance is started from a new ORACLE_HOME (where no other instance is started before). Linux needs to load the oracle binaries and a lot of libraries into memory, execute the binary which then allocates even more memory (SGA, etc).

After some time, the Linux kernel might identify some memory pages in oracle binary ir it's libraries which are not used anymore. (to make it "obvious", let's call that code responsible for startup of an instance.) The kernel might decide to page out these memory pages. It will not affect the instance at all, as it doesn't need that code anymore. - you will see some page out, but it doesn't hurt at all and clears some memory for better usage.

Now another instance starts from the same O_H. It needs to load all the binaries & libraries. Fortunately, the first instance has loaded them already, so the 2nd instance can use the same memory pages and doesn't even read them from memory. Much faster! - still it needs those program pages which were paged out before, as they are required at instance startup. So you will see some page in. Anything bad here? Not at all! The 2nd instance still starts faster as a majority of binaries are already available, and those which needs to be paged in are already processed (from file on filesystem to memory structures) and therefore less work is required and it's faster.

Of course, after some time, these pages will be paged out again, as they are not required.

Based on this simplified explanation, page in or page out are not a problem. They are only an issue, if "hot" (highly used) pages are going out&in on a regular base during execution and so the application needs to wait for them. Unfortunately I don't know an algorithm which can explain if page-in is good or bad. Maybe something like "count of blocks which are paged out & in withing a given time" could be used, but I don't see it implemented.

Similar is true for many other situations.

Even if you want to estimate, if an application will still fit into memory, it's not that easy, as it's not only the binary and its memory structures (which you need to know in advance) - all the shared libraries it might require can be loaded already.

These examples only showed the simple situations. There might be even more complex szenarios.

MY summary: you know to understand your question in extreme detail to be able to anser them, even to some degree.

All those who saw flaws in my explanation: I'm sorry & please explain where I failed, so we all can learn!

hth & sorry for the long post,

Martin

Am Do., 6. Dez. 2018 um 01:46 Uhr schrieb kyle Hailey <kylelf_at_gmail.com>:

One of those questions that seems like it should have been nailed down 20 years ago but it still seems lack a clear answer

How do you detect memory issues ?

I always used "po" or "paged outs". Now on Amazon Linux I don't see "po" but there is "bo" (blocks written out). In past, at least on OSF & Ultrix, page outs were a sign of needed memory that was written out to disk and when I needed that memory it would take a big performance hit to read it in. Thus "po" was a good canary on the coal mine. Any consistent values over over say 10 were a sign.

Some people use "scan rate" but I never found that as easy to interpret as page outs. Again what values would you use

Some suggest using freeable memory as a yardstick where freeable is "free" + "cached" or MemFree + Cached + Inactive. Even in this case what would you use for values to alert on?

I've always ignored swap stats as if you are swapping it is too late.

What do you use to detect memory issues ?

Kyle

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Dec 12 2018 - 22:07:23 CET

This message: [ Message body ]
Next message: Mark W. Farnham: "RE: Scheduler Jobs are not distributed according to OS-load on RAC noes"
Previous message: Martin Berger: "Re: Scheduler Jobs are not distributed according to OS-load on RAC noes"
In reply to: Martin Berger: "Re: How do you detect memory issues ?"
Next in thread: Neil Chandler: "Re: Anyone using Oracle in a hyperconverged infrastructure?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message