Re: How do you detect memory issues ?

From: kyle Hailey <kylelf_at_gmail.com>
Date: Wed, 5 Dec 2018 17:16:47 -0800
Message-ID: <CADsdiQiUJdv9Tw0br0iiMO8Narh6012i0EmZQuSOV7C9-nUsrQ_at_mail.gmail.com>



This is interesting:

This is interesting:
https://github.com/torvalds/linux/commit/34e431b0ae398fc54ea69ff85ec700722c9da773

       /proc/meminfo: provide estimated available memory

Many load balancing and workload placing programs check /proc/meminfo to estimate how much free memory is available. They generally do this by adding up "free" and "cached", which was fine ten years ago, but is pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as page cache, for example shared memory segments, tmpfs, and ramfs, and it does not include reclaimable slab memory, which can take up a large fraction of system memory on mostly idle systems with lots of files.

Currently, the amount of memory that is available for a new workload, without pushing the system into swap, can be estimated from MemFree, Active(file), Inactive(file), and SReclaimable, as well as the "low" watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should not be expected to know kernel internals to come up with an estimate for the amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo. If things change in the future, we only have to change it in one place.

Signed-off-by: Rik van Riel <riel_at_redhat.com>
Reported-by: Erik Mouw <erik.mouw_2_at_nxp.com>
Acked-by: Johannes Weiner <hannes_at_cmpxchg.org>
Signed-off-by: Andrew Morton <akpm_at_linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds_at_linux-foundation.org>

On Wed, Dec 5, 2018 at 4:44 PM kyle Hailey <kylelf_at_gmail.com> wrote:

> One of those questions that seems like it should have been nailed down 20
> years ago but it still seems lack a clear answer
>
> How do you detect memory issues ?
>
> I always used "*po" or "paged outs*". Now on Amazon Linux I don't see
> "po" but there is "bo" (blocks written out). In past, at least on OSF &
> Ultrix, page outs were a sign of needed memory that was written out to disk
> and when I needed that memory it would take a big performance hit to read
> it in. Thus "po" was a good canary on the coal mine. Any consistent values
> over over say 10 were a sign.
>
> Some people use "*scan rate*" but I never found that as easy to interpret
> as page outs. Again what values would you use
>
> Some suggest using freeable memory as a yardstick where freeable is
> "free" + "cached" or MemFree + Cached + Inactive. Even in this case what
> would you use for values to alert on?
>
> I've always ignored swap stats as if you are swapping it is too late.
>
> What do you use to detect memory issues ?
>
> Kyle
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Dec 06 2018 - 02:16:47 CET

Original text of this message