Re: How do you detect memory issues ?
Date: Wed, 12 Dec 2018 20:20:55 +0100
Message-ID: <CALH8A92Urnk=qZCwKasfc1zdofT5cTzp_hOvWfyxP_7UPLVciA_at_mail.gmail.com>
Hi Kyle,
I'm not a pro regarding linux memory management, but we have to discuss this question with our application OPS teams regularly. My explanations might not be 100% correct, but it should be sufficient to get a good picture (and why it's so complicated to answer your question)
There is a simple fact why on a linux system, after some time "free" will go down to zero: Linux tries to use all memory as good as possible. IF it is not needed for anything else, filesystem cache is a good use. That is not a problem at all: if memory is needed, some FS caches are used for something "better".
Paging is also nothing bad by its own:
To stay in oracle notation. Imagine a new instance is started from a new
ORACLE_HOME (where no other instance is started before). Linux needs to
load the oracle binaries and a lot of libraries into memory, execute the
binary which then allocates even more memory (SGA, etc).
After some time, the Linux kernel might identify some memory pages in
oracle binary ir it's libraries which are not used anymore. (to make it
"obvious", let's call that code responsible for startup of an instance.)
The kernel might decide to page out these memory pages. It will not affect
the instance at all, as it doesn't need that code anymore. - you will see
some page out, but it doesn't hurt at all and clears some memory for better
usage.
Now another instance starts from the same O_H. It needs to load all the
binaries & libraries. Fortunately, the first instance has loaded them
already, so the 2nd instance can use the same memory pages and doesn't even
read them from memory. Much faster! - still it needs those program pages
which were paged out before, as they are required at instance startup. So
you will see some page in. Anything bad here? Not at all! The 2nd instance
still starts faster as a majority of binaries are already available, and
those which needs to be paged in are already processed (from file on
filesystem to memory structures) and therefore less work is required and
it's faster.
Of course, after some time, these pages will be paged out again, as they
are not required.
Based on this simplified explanation, page in or page out are not a problem. They are only an issue, if "hot" (highly used) pages are going out&in on a regular base during execution and so the application needs to wait for them. Unfortunately I don't know an algorithm which can explain if page-in is good or bad. Maybe something like "count of blocks which are paged out & in withing a given time" could be used, but I don't see it implemented.
Similar is true for many other situations. Even if you want to estimate, if an application will still fit into memory, it's not that easy, as it's not only the binary and its memory structures (which you need to know in advance) - all the shared libraries it might require can be loaded already.
These examples only showed the simple situations. There might be even more complex szenarios.
MY summary: you know to understand your question in extreme detail to be able to anser them, even to some degree.
All those who saw flaws in my explanation: I'm sorry & please explain where I failed, so we all can learn!
hth & sorry for the long post,
Martin
Am Do., 6. Dez. 2018 um 01:46 Uhr schrieb kyle Hailey <kylelf_at_gmail.com>:
> One of those questions that seems like it should have been nailed down 20
> years ago but it still seems lack a clear answer
>
> How do you detect memory issues ?
>
> I always used "*po" or "paged outs*". Now on Amazon Linux I don't see
> "po" but there is "bo" (blocks written out). In past, at least on OSF &
> Ultrix, page outs were a sign of needed memory that was written out to disk
> and when I needed that memory it would take a big performance hit to read
> it in. Thus "po" was a good canary on the coal mine. Any consistent values
> over over say 10 were a sign.
>
> Some people use "*scan rate*" but I never found that as easy to interpret
> as page outs. Again what values would you use
>
> Some suggest using freeable memory as a yardstick where freeable is
> "free" + "cached" or MemFree + Cached + Inactive. Even in this case what
> would you use for values to alert on?
>
> I've always ignored swap stats as if you are swapping it is too late.
>
> What do you use to detect memory issues ?
>
> Kyle
>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Dec 12 2018 - 20:20:55 CET