Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> kernel panic on Red Hat AS because of OCFS
We use RAC 9.2.0.3 with OCFS 1.0.8 on redhat AS 2.1 with kernel
2.4.9-e.16smp in production. It has been running in test environment
for 4-6 months with previous version of ocfs and linux kernel.
However, now it appears that OCFS start causing kernel panic. We have
more than 20 linux boxes, and the kernel panic only happens on the RAC
boxes, which OCFS is used. It happens about once or twice per week.
After it panic, sometime I find the following in /var/log/messages,
sometime I can only see it in the monitor.
Aug 12 13:01:03 rac1 kernel: Unable to handle kernel NULL pointer
dereference
at virtual address 00000074
Aug 12 13:01:03 rac1 kernel: printing eip: Aug 12 13:01:03 rac1 kernel: c01382e7 Aug 12 13:01:03 rac1 kernel: *pde = 00000000Aug 12Your tar has been recieved and has been assigned to an Analyst. 13:01:03 rac1 kernel: Oops: 0000
Aug 12 13:01:03 rac1 kernel: Kernel 2.4.9-e.16smp Aug 12 13:01:03 rac1 kernel: CPU: 0 Aug 12 13:01:03 rac1 kernel: EIP: 0010:[kfree+55/144] Not tainted Aug 12 13:01:03 rac1 kernel: EIP: 0010:[<c01382e7>] Not tainted Aug 12 13:01:03 rac1 kernel: EFLAGS: 00010086 Aug 12 13:01:03 rac1 kernel: EIP is at kfree [kernel] 0x37 Aug 12 13:01:03 rac1 kernel: eax: 00000000 ebx: ece02860 ecx: 00000000edx: 00038c20
Aug 12 13:01:04 rac1 kernel: [<c012edbb>] unmap_fixup [kernel] 0x14b Aug 12 13:01:04 rac1 kernel: [__fput+43/208] __fput [kernel] 0x2b Aug 12 13:01:04 rac1 kernel: [<c014697b>] __fput [kernel] 0x2b Aug 12 13:01:04 rac1 kernel: [filp_close+158/176] filp_close [kernel]0x9e
More often in /var/log/messages, I can find
Sep 15 15:34:58 rac2 kernel: (4622) ERROR: Access denied while opening
file, Linux/ocfsmain.c, 2189
Sep 15 15:34:58 rac2 kernel: (4623) ERROR: Access denied while opening
file, Linux/ocfsmain.c, 2189
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
Common/ocfsgencreate.c, 1605
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
Common/ocfsgencreate.c, 1794
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
Linux/ocfsmain.c, 1942
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
Linux/ocfsmain.c, 2266
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
Common/ocfsgendirnode.c, 1379
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
Common/ocfsgendirnode.c, 1379
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
Common/ocfsgentrans.c, 396
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
Common/ocfsgentrans.c, 396
Before the kernel panic, I can find heaps of them. Normally alert.log
shows that DB is trying to write something like log switch.
I made a TAR with Oracle Support. First response is upgrade to latest
version of OCFS, which is 1.0.9. I upgrade it, but it doesn't fix the
problem. The latest response is
"You may need to collect some more information taking assistance from
the OS vendor.We need to get the stack when the kernel paniced
indicating that the kernel panic after upgrading to OCFS 1.0.9 is
caused by OCFS and is not for any other OS reason. We need information
like stack and register values for progressing this as OCFS issue. You
can dump the kernel after the panic and pass it on the OS vendor from
which they can collect the above mentioned information and pass it on
to us."
I'm a DBA with limited knowledge of SA. Redhat also provides very limit support. I'm not sure what is the next step I can do from here.
Any assistance would be greatly appreciated.
Regards,
Bin
BTW, I first accidently posted it in comp.databases.oracle on Sunday, then I reposted it in comp.databases.oracle.server yesterday but still cannot find it today. So I post it again. Received on Mon Sep 22 2003 - 23:05:00 CDT