Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> (Re:)kernel panic on Red Hat AS because of OCFS

(Re:)kernel panic on Red Hat AS because of OCFS

From: wangbin <wangbin_at_start.com.au>
Date: 15 Jan 2004 20:00:08 -0800
Message-ID: <2d15bd69.0401152000.2c54c76a@posting.google.com>


I'm not why I cannot follow the original thread. The issue is still there.
Basically, we use RAC 9.2.0.3 with OCFS 1.0.9-12 on redhat AS 2.1 with kernel 2.4.9-e.25. Each node has 2.5G memory. The system is quite stable if I don't touch those OCFS. After the server running for a while, for example one or two weeks, I cannot use any command to access ocfs file system, such as ls or find. Oracle also cannot create any new files in those file system. If I do, I will get the following error in /var/log/messages.
Dec 31 13:26:54 rac1 kernel: (30983) ERROR: status = -12, Common/ocfsgencreate.c, 1689
Dec 31 13:26:54 rac1 kernel: (30983) ERROR: status = -12, Linux/ocfsmain.c,
2122
After a couple times, the node may hang and die. It happens to both production and test system regardless the load on the box.

I raise it as an issue to oracle. The response is
"The system/OCFS returning -12 or ENOMEM, that means that there is
GENERIC OS memory exhaustion." and suggest to tune vm parameters. echo "35000 45000 50000" > /proc/sys/vm/freepages The proof is from meminfo HighFree is only 3M.

        total: used: free: shared: buffers: cached: Mem: 2636136448 2629435392 6701056 1217822720 204898304 877502464 Swap: 4301758464 360468480 3941289984

MemTotal:      2574352 kB
MemFree:          6544 kB
MemShared:     1189280 kB
Buffers:        200096 kB
Cached:         598496 kB
SwapCached:     258440 kB
Active:        1442604 kB
Inact_dirty:    261412 kB

Inact_clean: 542296 kB
Inact_target: 643544 kB
HighTotal:     1703856 kB
HighFree:         2036 kB
LowTotal:       870496 kB
LowFree:          4508 kB
SwapTotal:     4200936 kB
SwapFree:      3848916 kB
BigPagesFree:        0 kB

From database point of view, the performance is fine. From the output of vmstat, there is no swap. And the swap file is only used by 10%.  r b w swpd free buff cache si so bi bo in cs us sy id
 0 1 0 352000 7004 200604 600944 0 0 1 2 0 0 2 0 2
 1 0 0 352000 7004 200608 600944 0 0 116 95 1155 2724 4 1 95
 1 0 0 352000 6996 200612 600944 0 0 120 141 1806 4114 6 2 92
 0 0 0 352000 7004 200616 600944 0 0 124 129 1454 3358 4 1 94

In http://www.redhat.com/advice/tips/meminfo.html,
"LowFree: The amount of free memory of the low memory region. This is
the memory the kernel can address directly. All kernel datastructures need to go into low memory."

In http://www.oreilly.com/catalog/spt2/chapter/ch04.html
"Occasionally, however, a system will experience a kernel memory
allocation error. While there is a limit on the size of kernel memory,[7] the problem is caused by the kernel trying to get memory when the free list is completely exhausted. Since the kernel cannot always wait for memory to become available, this can cause operations to fail rather than be delayed."

Is it possible that the ocfs driver needs more memory, but fail because of very low in LowFree? However, http://linuxcompressed.sourceforge.net/vm24/ shows that pages in inactive_clean can be reused, and the box has a lot in inact_clean.
"inactive_clean list

I also don't know how would /proc/sys/vm/freepages affect it, because /proc/sys/vm/freepages affects swap behavior while there is no swap happening on the box, only paging.

How to identify a linux box is memory exhaustion?

If I find a linux box is memory exhaustion, how can I find the memory usage by each process? I check VmSize from "ps -auxww" "cat /proc/$pid/status". Since oracle uses shared memory, the result doesn't give me a clear picture. For example, how many shared memory a process use and how many private?

Thanks,
Bin Received on Thu Jan 15 2004 - 22:00:08 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US