Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: kernel panic on Red Hat AS because of OCFS
On Mon, 22 Sep 2003 21:05:00 -0700, wangbin wrote:
> We use RAC 9.2.0.3 with OCFS 1.0.8 on redhat AS 2.1 with kernel
> 2.4.9-e.16smp in production. It has been running in test environment
> for 4-6 months with previous version of ocfs and linux kernel.
> However, now it appears that OCFS start causing kernel panic. We have
> more than 20 linux boxes, and the kernel panic only happens on the RAC
> boxes, which OCFS is used. It happens about once or twice per week.
>
> After it panic, sometime I find the following in /var/log/messages,
> sometime I can only see it in the monitor.
> Aug 12 13:01:03 rac1 kernel: Unable to handle kernel NULL pointer
> dereference
> at virtual address 00000074
> Aug 12 13:01:03 rac1 kernel: printing eip:
> Aug 12 13:01:03 rac1 kernel: c01382e7
> Aug 12 13:01:03 rac1 kernel: *pde = 00000000
> Aug 12Your tar has been recieved and has been assigned to an Analyst.
> 13:01:03 rac1 kernel: Oops: 0000
> Aug 12 13:01:03 rac1 kernel: Kernel 2.4.9-e.16smp
> Aug 12 13:01:03 rac1 kernel: CPU: 0
> Aug 12 13:01:03 rac1 kernel: EIP: 0010:[kfree+55/144] Not tainted
> Aug 12 13:01:03 rac1 kernel: EIP: 0010:[<c01382e7>] Not tainted
> Aug 12 13:01:03 rac1 kernel: EFLAGS: 00010086
> Aug 12 13:01:03 rac1 kernel: EIP is at kfree [kernel] 0x37
> Aug 12 13:01:03 rac1 kernel: eax: 00000000 ebx: ece02860 ecx: 00000000
> edx: 00038c20
> Aug 12 13:01:03 rac1 kernel: esi: f8c20000 edi: 00000286 ebp: f4e8e240
> esp: dd757ea0
> Aug 12 13:01:04 rac1 kernel: ds: 0018 es: 0018 ss: 0018
> Aug 12 13:01:04 rac1 kernel: Process find (pid: 8393,
> stackpage=dd757000)
> Aug 12 13:01:04 rac1 kernel: Stack: f8c20000 00000000 eb6dea80
> ffffffff
> 00021000 ece02860 f1c708a0 f4e58e40
> Aug 12 13:01:04 rac1 kernel: f8b37b64 f8c20000 c8af75ec 00000001
> 00000001 4017b000 c75e2404 40400000
> Aug 12 13:01:04 rac1 kernel: c0372020 00000001 00000000 00000000
> f4e8e240 eb6dea80 dd756000 c0117e90
> Aug 12 13:01:04 rac1 kernel: Call Trace: [<f8b37b64>]
> ocfs_file_release [ocfs]
> 0x140
> Aug 12 13:01:04 rac1 kernel: [do_page_fault+0/1168] do_page_fault
> [kernel] 0x0
> Aug 12 13:01:04 rac1 kernel: [<c0117e90>] do_page_fault [kernel] 0x0
> Aug 12 13:01:04 rac1 kernel: [do_page_fault+422/1168] do_page_fault
> [kernel]
> 0x1a6
> Aug 12 13:01:04 rac1 kernel: [<c0118036>] do_page_fault [kernel] 0x1a6
> Aug 12 13:01:04 rac1 kernel: [unmap_fixup+315/352] unmap_fixup
> [kernel] 0x13b
> Aug 12 13:01:04 rac1 kernel: [<c012edab>] unmap_fixup [kernel] 0x13b
> Aug 12 13:01:04 rac1 kernel: [unmap_fixup+331/352] unmap_fixup
> [kernel] 0x14b
> Aug 12 13:01:04 rac1 kernel: [<c012edbb>] unmap_fixup [kernel] 0x14b
> Aug 12 13:01:04 rac1 kernel: [__fput+43/208] __fput [kernel] 0x2b
> Aug 12 13:01:04 rac1 kernel: [<c014697b>] __fput [kernel] 0x2b
> Aug 12 13:01:04 rac1 kernel: [filp_close+158/176] filp_close [kernel]
> 0x9e
> Aug 12 13:01:04 rac1 kernel: [<c014558e>] filp_close [kernel] 0x9e
> Aug 12 13:01:04 rac1 kernel: [sys_close+91/112] sys_close [kernel]
> 0x5b
> Aug 12 13:01:04 rac1 kernel: [<c01455fb>] sys_cose [kernel] 0x5b
> Aug 12 13:01:04 rac1 kernel: [system_call+51/56] system_call [kernel]
> 0x33
> Aug 12 13:01:04 rac1 kernel: [<c01072e3>] system_call [kernel] 0x33
> Aug 12 13:01:04 rac1 kernel: Code: 8b 5c 81 74 85 db 74 37 8b 13 3b 53
> 04 73 0a
> 89 74 93 08 ff
> Aug 12 13:01:04 rac1 kernel: <0>Kernel panic: not continuing
>
> More often in /var/log/messages, I can find
> Sep 15 15:34:58 rac2 kernel: (4622) ERROR: Access denied while opening
> file, Linux/ocfsmain.c, 2189
> Sep 15 15:34:58 rac2 kernel: (4623) ERROR: Access denied while opening
> file, Linux/ocfsmain.c, 2189
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Common/ocfsgencreate.c, 1605
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Common/ocfsgencreate.c, 1794
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Linux/ocfsmain.c, 1942
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Linux/ocfsmain.c, 2266
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgendirnode.c, 1379
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgendirnode.c, 1379
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgentrans.c, 396
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgentrans.c, 396
> Before the kernel panic, I can find heaps of them. Normally alert.log
> shows that DB is trying to write something like log switch.
>
> I made a TAR with Oracle Support. First response is upgrade to latest
> version of OCFS, which is 1.0.9. I upgrade it, but it doesn't fix the
> problem. The latest response is
> "You may need to collect some more information taking assistance from
> the OS vendor.We need to get the stack when the kernel paniced
> indicating that the kernel panic after upgrading to OCFS 1.0.9 is
> caused by OCFS and is not for any other OS reason. We need information
> like stack and register values for progressing this as OCFS issue. You
> can dump the kernel after the panic and pass it on the OS vendor from
> which they can collect the above mentioned information and pass it on
> to us."
>
> I'm a DBA with limited knowledge of SA. Redhat also provides very
> limit support. I'm not sure what is the next step I can do from here.
>
> Any assistance would be greatly appreciated.
>
> Regards,
> Bin
>
> BTW, I first accidently posted it in comp.databases.oracle on Sunday,
> then I reposted it in comp.databases.oracle.server yesterday but still
> cannot find it today. So I post it again.
Received on Tue Sep 23 2003 - 19:41:40 CDT