Re: How do you detect memory issues ?

From: Martin Berger <martin.a.berger_at_gmail.com>
Date: Wed, 12 Dec 2018 20:20:55 +0100
Message-ID: <CALH8A92Urnk=qZCwKasfc1zdofT5cTzp_hOvWfyxP_7UPLVciA_at_mail.gmail.com>



Hi Kyle,

I'm not a pro regarding linux memory management, but we have to discuss this question with our application OPS teams regularly. My explanations might not be 100% correct, but it should be sufficient to get a good picture (and why it's so complicated to answer your question)

There is a simple fact why on a linux system, after some time "free" will go down to zero: Linux tries to use all memory as good as possible. IF it is not needed for anything else, filesystem cache is a good use. That is not a problem at all: if memory is needed, some FS caches are used for something "better".

Paging is also nothing bad by its own:
To stay in oracle notation. Imagine a new instance is started from a new ORACLE_HOME (where no other instance is started before). Linux needs to load the oracle binaries and a lot of libraries into memory, execute the binary which then allocates even more memory (SGA, etc). After some time, the Linux kernel might identify some memory pages in oracle binary ir it's libraries which are not used anymore. (to make it "obvious", let's call that code responsible for startup of an instance.) The kernel might decide to page out these memory pages. It will not affect the instance at all, as it doesn't need that code anymore. - you will see some page out, but it doesn't hurt at all and clears some memory for better usage.
Now another instance starts from the same O_H. It needs to load all the binaries & libraries. Fortunately, the first instance has loaded them already, so the 2nd instance can use the same memory pages and doesn't even read them from memory. Much faster! - still it needs those program pages which were paged out before, as they are required at instance startup. So you will see some page in. Anything bad here? Not at all! The 2nd instance still starts faster as a majority of binaries are already available, and those which needs to be paged in are already processed (from file on filesystem to memory structures) and therefore less work is required and it's faster.
Of course, after some time, these pages will be paged out again, as they are not required.

Based on this simplified explanation, page in or page out are not a problem. They are only an issue, if "hot" (highly used) pages are going out&in on a regular base during execution and so the application needs to wait for them. Unfortunately I don't know an algorithm which can explain if page-in is good or bad. Maybe something like "count of blocks which are paged out & in withing a given time" could be used, but I don't see it implemented.

Similar is true for many other situations. Even if you want to estimate, if an application will still fit into memory, it's not that easy, as it's not only the binary and its memory structures (which you need to know in advance) - all the shared libraries it might require can be loaded already.

These examples only showed the simple situations. There might be even more complex szenarios.

MY summary: you know to understand your question in extreme detail to be able to anser them, even to some degree.

All those who saw flaws in my explanation: I'm sorry & please explain where I failed, so we all can learn!

hth & sorry for the long post,
 Martin

Am Do., 6. Dez. 2018 um 01:46 Uhr schrieb kyle Hailey <kylelf_at_gmail.com>:

> One of those questions that seems like it should have been nailed down 20
> years ago but it still seems lack a clear answer
>
> How do you detect memory issues ?
>
> I always used "*po" or "paged outs*". Now on Amazon Linux I don't see
> "po" but there is "bo" (blocks written out). In past, at least on OSF &
> Ultrix, page outs were a sign of needed memory that was written out to disk
> and when I needed that memory it would take a big performance hit to read
> it in. Thus "po" was a good canary on the coal mine. Any consistent values
> over over say 10 were a sign.
>
> Some people use "*scan rate*" but I never found that as easy to interpret
> as page outs. Again what values would you use
>
> Some suggest using freeable memory as a yardstick where freeable is
> "free" + "cached" or MemFree + Cached + Inactive. Even in this case what
> would you use for values to alert on?
>
> I've always ignored swap stats as if you are swapping it is too late.
>
> What do you use to detect memory issues ?
>
> Kyle
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 12 2018 - 20:20:55 CET

Original text of this message