Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> (Re:)kernel panic on Red Hat AS because of OCFS
I'm not why I cannot follow the original thread. The issue is still
there.
Basically, we use RAC 9.2.0.3 with OCFS 1.0.9-12 on redhat AS 2.1 with
kernel 2.4.9-e.25. Each node has 2.5G memory. The system is quite
stable if I don't touch those OCFS. After the server running for a
while, for example one or two weeks, I cannot use any command to
access ocfs file system, such as ls or find. Oracle also cannot create
any new files in those file system. If I do, I will get the following
error in /var/log/messages.
Dec 31 13:26:54 rac1 kernel: (30983) ERROR: status = -12,
Common/ocfsgencreate.c, 1689
Dec 31 13:26:54 rac1 kernel: (30983) ERROR: status = -12,
Linux/ocfsmain.c,
2122
After a couple times, the node may hang and die. It happens to both
production and test system regardless the load on the box.
I raise it as an issue to oracle. The response is
"The system/OCFS returning -12 or ENOMEM, that means that there is
GENERIC OS memory exhaustion." and suggest to tune vm parameters.
echo "35000 45000 50000" > /proc/sys/vm/freepages
The proof is from meminfo HighFree is only 3M.
total: used: free: shared: buffers: cached: Mem: 2636136448 2629435392 6701056 1217822720 204898304 877502464 Swap: 4301758464 360468480 3941289984
MemTotal: 2574352 kB MemFree: 6544 kB MemShared: 1189280 kB Buffers: 200096 kB Cached: 598496 kB SwapCached: 258440 kB Active: 1442604 kB Inact_dirty: 261412 kB
HighTotal: 1703856 kB HighFree: 2036 kB LowTotal: 870496 kB LowFree: 4508 kB SwapTotal: 4200936 kB SwapFree: 3848916 kB BigPagesFree: 0 kB
From database point of view, the performance is fine. From the output
of vmstat, there is no swap. And the swap file is only used by 10%.
r b w swpd free buff cache si so bi bo in cs
us sy id
0 1 0 352000 7004 200604 600944 0 0 1 2 0 0
2 0 2
1 0 0 352000 7004 200608 600944 0 0 116 95 1155 2724
4 1 95
1 0 0 352000 6996 200612 600944 0 0 120 141 1806 4114
6 2 92
0 0 0 352000 7004 200616 600944 0 0 124 129 1454 3358
4 1 94
In http://www.redhat.com/advice/tips/meminfo.html,
"LowFree: The amount of free memory of the low memory region. This is
the memory the kernel can address directly. All kernel datastructures
need to go into low memory."
In http://www.oreilly.com/catalog/spt2/chapter/ch04.html
"Occasionally, however, a system will experience a kernel memory
allocation error. While there is a limit on the size of kernel
memory,[7] the problem is caused by the kernel trying to get memory
when the free list is completely exhausted. Since the kernel cannot
always wait for memory to become available, this can cause operations
to fail rather than be delayed."
Is it possible that the ocfs driver needs more memory, but fail
because of very low in LowFree? However,
http://linuxcompressed.sourceforge.net/vm24/ shows that pages in
inactive_clean can be reused, and the box has a lot in inact_clean.
"inactive_clean list
I also don't know how would /proc/sys/vm/freepages affect it, because /proc/sys/vm/freepages affects swap behavior while there is no swap happening on the box, only paging.
How to identify a linux box is memory exhaustion?
If I find a linux box is memory exhaustion, how can I find the memory usage by each process? I check VmSize from "ps -auxww" "cat /proc/$pid/status". Since oracle uses shared memory, the result doesn't give me a clear picture. For example, how many shared memory a process use and how many private?
Thanks,
Bin
Received on Thu Jan 15 2004 - 22:00:08 CST