Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> kernel panic on Red Hat AS because of OCFS

kernel panic on Red Hat AS because of OCFS

From: wangbin <wangbin_at_start.com.au>
Date: 22 Sep 2003 21:05:00 -0700
Message-ID: <2d15bd69.0309222005.5a88aa5e@posting.google.com>


We use RAC 9.2.0.3 with OCFS 1.0.8 on redhat AS 2.1 with kernel 2.4.9-e.16smp in production. It has been running in test environment for 4-6 months with previous version of ocfs and linux kernel. However, now it appears that OCFS start causing kernel panic. We have more than 20 linux boxes, and the kernel panic only happens on the RAC boxes, which OCFS is used. It happens about once or twice per week.

After it panic, sometime I find the following in /var/log/messages, sometime I can only see it in the monitor. Aug 12 13:01:03 rac1 kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000074

Aug 12 13:01:03 rac1 kernel: printing eip:
Aug 12 13:01:03 rac1 kernel: c01382e7
Aug 12 13:01:03 rac1 kernel: *pde = 00000000
Aug 12Your tar has been recieved and has been assigned to an Analyst. 13:01:03 rac1 kernel: Oops: 0000
Aug 12 13:01:03 rac1 kernel: Kernel 2.4.9-e.16smp
Aug 12 13:01:03 rac1 kernel: CPU: 0
Aug 12 13:01:03 rac1 kernel: EIP: 0010:[kfree+55/144] Not tainted
Aug 12 13:01:03 rac1 kernel: EIP: 0010:[<c01382e7>] Not tainted
Aug 12 13:01:03 rac1 kernel: EFLAGS: 00010086
Aug 12 13:01:03 rac1 kernel: EIP is at kfree [kernel] 0x37
Aug 12 13:01:03 rac1 kernel: eax: 00000000 ebx: ece02860 ecx: 00000000
edx: 00038c20
Aug 12 13:01:03 rac1 kernel: esi: f8c20000 edi: 00000286 ebp: f4e8e240 esp: dd757ea0
Aug 12 13:01:04 rac1 kernel: ds: 0018 es: 0018 ss: 0018 Aug 12 13:01:04 rac1 kernel: Process find (pid: 8393, stackpage=dd757000)
Aug 12 13:01:04 rac1 kernel: Stack: f8c20000 00000000 eb6dea80 ffffffff
00021000 ece02860 f1c708a0 f4e58e40
Aug 12 13:01:04 rac1 kernel: f8b37b64 f8c20000 c8af75ec 00000001 00000001 4017b000 c75e2404 40400000
Aug 12 13:01:04 rac1 kernel: c0372020 00000001 00000000 00000000 f4e8e240 eb6dea80 dd756000 c0117e90
Aug 12 13:01:04 rac1 kernel: Call Trace: [<f8b37b64>] ocfs_file_release [ocfs]
0x140
Aug 12 13:01:04 rac1 kernel: [do_page_fault+0/1168] do_page_fault
[kernel] 0x0

Aug 12 13:01:04 rac1 kernel: [<c0117e90>] do_page_fault [kernel] 0x0 Aug 12 13:01:04 rac1 kernel: [do_page_fault+422/1168] do_page_fault
[kernel]

0x1a6
Aug 12 13:01:04 rac1 kernel: [<c0118036>] do_page_fault [kernel] 0x1a6 Aug 12 13:01:04 rac1 kernel: [unmap_fixup+315/352] unmap_fixup
[kernel] 0x13b

Aug 12 13:01:04 rac1 kernel: [<c012edab>] unmap_fixup [kernel] 0x13b Aug 12 13:01:04 rac1 kernel: [unmap_fixup+331/352] unmap_fixup
[kernel] 0x14b
Aug 12 13:01:04 rac1 kernel: [<c012edbb>] unmap_fixup [kernel] 0x14b
Aug 12 13:01:04 rac1 kernel: [__fput+43/208] __fput [kernel] 0x2b
Aug 12 13:01:04 rac1 kernel: [<c014697b>] __fput [kernel] 0x2b
Aug 12 13:01:04 rac1 kernel: [filp_close+158/176] filp_close [kernel]
0x9e
Aug 12 13:01:04 rac1 kernel: [<c014558e>] filp_close [kernel] 0x9e Aug 12 13:01:04 rac1 kernel: [sys_close+91/112] sys_close [kernel] 0x5b
Aug 12 13:01:04 rac1 kernel: [<c01455fb>] sys_cose [kernel] 0x5b Aug 12 13:01:04 rac1 kernel: [system_call+51/56] system_call [kernel] 0x33
Aug 12 13:01:04 rac1 kernel: [<c01072e3>] system_call [kernel] 0x33 Aug 12 13:01:04 rac1 kernel: Code: 8b 5c 81 74 85 db 74 37 8b 13 3b 53 04 73 0a
89 74 93 08 ff
Aug 12 13:01:04 rac1 kernel: <0>Kernel panic: not continuing

More often in /var/log/messages, I can find Sep 15 15:34:58 rac2 kernel: (4622) ERROR: Access denied while opening file, Linux/ocfsmain.c, 2189
Sep 15 15:34:58 rac2 kernel: (4623) ERROR: Access denied while opening file, Linux/ocfsmain.c, 2189
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12, Common/ocfsgencreate.c, 1605
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12, Common/ocfsgencreate.c, 1794
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12, Linux/ocfsmain.c, 1942
Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12, Linux/ocfsmain.c, 2266
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12, Common/ocfsgendirnode.c, 1379
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12, Common/ocfsgendirnode.c, 1379
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12, Common/ocfsgentrans.c, 396
Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12, Common/ocfsgentrans.c, 396
Before the kernel panic, I can find heaps of them. Normally alert.log shows that DB is trying to write something like log switch.

I made a TAR with Oracle Support. First response is upgrade to latest version of OCFS, which is 1.0.9. I upgrade it, but it doesn't fix the problem. The latest response is
"You may need to collect some more information taking assistance from the OS vendor.We need to get the stack when the kernel paniced indicating that the kernel panic after upgrading to OCFS 1.0.9 is caused by OCFS and is not for any other OS reason. We need information like stack and register values for progressing this as OCFS issue. You can dump the kernel after the panic and pass it on the OS vendor from which they can collect the above mentioned information and pass it on to us."

I'm a DBA with limited knowledge of SA. Redhat also provides very limit support. I'm not sure what is the next step I can do from here.

Any assistance would be greatly appreciated.

Regards,
Bin

BTW, I first accidently posted it in comp.databases.oracle on Sunday, then I reposted it in comp.databases.oracle.server yesterday but still cannot find it today. So I post it again. Received on Mon Sep 22 2003 - 23:05:00 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US