Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: kernel panic on Red Hat AS because of OCFS

Re: kernel panic on Red Hat AS because of OCFS

From: snip3r <snip3r_at_nospam.com>
Date: Tue, 23 Sep 2003 17:41:40 -0700
Message-Id: <pan.2003.09.24.00.41.27.620246@nospam.com>

  1. -12 means ENOMEM. You are running out of memory. Atleast kernel memory. There is nothing ocfs or any module can do when that happens.
  2. RH has made significant memory related fixes in e25. Upgrade to that.
  3. Upgrade ocfs to 1.0.9-6.
  4. Ensure ocfs is listed in PRUNEFS in /etc/updatedb.conf.
  5. The process crashed is find. What were you doing when it crashed?
  6. The EIP is in kfree. This is a memory issue.

On Mon, 22 Sep 2003 21:05:00 -0700, wangbin wrote:

> We use RAC 9.2.0.3 with OCFS 1.0.8 on redhat AS 2.1 with kernel
> 2.4.9-e.16smp in production. It has been running in test environment
> for 4-6 months with previous version of ocfs and linux kernel.
> However, now it appears that OCFS start causing kernel panic. We have
> more than 20 linux boxes, and the kernel panic only happens on the RAC
> boxes, which OCFS is used. It happens about once or twice per week.
>
> After it panic, sometime I find the following in /var/log/messages,
> sometime I can only see it in the monitor.
> Aug 12 13:01:03 rac1 kernel: Unable to handle kernel NULL pointer
> dereference
> at virtual address 00000074
> Aug 12 13:01:03 rac1 kernel: printing eip:
> Aug 12 13:01:03 rac1 kernel: c01382e7
> Aug 12 13:01:03 rac1 kernel: *pde = 00000000
> Aug 12Your tar has been recieved and has been assigned to an Analyst.
> 13:01:03 rac1 kernel: Oops: 0000
> Aug 12 13:01:03 rac1 kernel: Kernel 2.4.9-e.16smp
> Aug 12 13:01:03 rac1 kernel: CPU: 0
> Aug 12 13:01:03 rac1 kernel: EIP: 0010:[kfree+55/144] Not tainted
> Aug 12 13:01:03 rac1 kernel: EIP: 0010:[<c01382e7>] Not tainted
> Aug 12 13:01:03 rac1 kernel: EFLAGS: 00010086
> Aug 12 13:01:03 rac1 kernel: EIP is at kfree [kernel] 0x37
> Aug 12 13:01:03 rac1 kernel: eax: 00000000 ebx: ece02860 ecx: 00000000
> edx: 00038c20
> Aug 12 13:01:03 rac1 kernel: esi: f8c20000 edi: 00000286 ebp: f4e8e240
> esp: dd757ea0
> Aug 12 13:01:04 rac1 kernel: ds: 0018 es: 0018 ss: 0018
> Aug 12 13:01:04 rac1 kernel: Process find (pid: 8393,
> stackpage=dd757000)
> Aug 12 13:01:04 rac1 kernel: Stack: f8c20000 00000000 eb6dea80
> ffffffff
> 00021000 ece02860 f1c708a0 f4e58e40
> Aug 12 13:01:04 rac1 kernel: f8b37b64 f8c20000 c8af75ec 00000001
> 00000001 4017b000 c75e2404 40400000
> Aug 12 13:01:04 rac1 kernel: c0372020 00000001 00000000 00000000
> f4e8e240 eb6dea80 dd756000 c0117e90
> Aug 12 13:01:04 rac1 kernel: Call Trace: [<f8b37b64>]
> ocfs_file_release [ocfs]
> 0x140
> Aug 12 13:01:04 rac1 kernel: [do_page_fault+0/1168] do_page_fault
> [kernel] 0x0
> Aug 12 13:01:04 rac1 kernel: [<c0117e90>] do_page_fault [kernel] 0x0
> Aug 12 13:01:04 rac1 kernel: [do_page_fault+422/1168] do_page_fault
> [kernel]
> 0x1a6
> Aug 12 13:01:04 rac1 kernel: [<c0118036>] do_page_fault [kernel] 0x1a6
> Aug 12 13:01:04 rac1 kernel: [unmap_fixup+315/352] unmap_fixup
> [kernel] 0x13b
> Aug 12 13:01:04 rac1 kernel: [<c012edab>] unmap_fixup [kernel] 0x13b
> Aug 12 13:01:04 rac1 kernel: [unmap_fixup+331/352] unmap_fixup
> [kernel] 0x14b
> Aug 12 13:01:04 rac1 kernel: [<c012edbb>] unmap_fixup [kernel] 0x14b
> Aug 12 13:01:04 rac1 kernel: [__fput+43/208] __fput [kernel] 0x2b
> Aug 12 13:01:04 rac1 kernel: [<c014697b>] __fput [kernel] 0x2b
> Aug 12 13:01:04 rac1 kernel: [filp_close+158/176] filp_close [kernel]
> 0x9e
> Aug 12 13:01:04 rac1 kernel: [<c014558e>] filp_close [kernel] 0x9e
> Aug 12 13:01:04 rac1 kernel: [sys_close+91/112] sys_close [kernel]
> 0x5b
> Aug 12 13:01:04 rac1 kernel: [<c01455fb>] sys_cose [kernel] 0x5b
> Aug 12 13:01:04 rac1 kernel: [system_call+51/56] system_call [kernel]
> 0x33
> Aug 12 13:01:04 rac1 kernel: [<c01072e3>] system_call [kernel] 0x33
> Aug 12 13:01:04 rac1 kernel: Code: 8b 5c 81 74 85 db 74 37 8b 13 3b 53
> 04 73 0a
> 89 74 93 08 ff
> Aug 12 13:01:04 rac1 kernel: <0>Kernel panic: not continuing
>
> More often in /var/log/messages, I can find
> Sep 15 15:34:58 rac2 kernel: (4622) ERROR: Access denied while opening
> file, Linux/ocfsmain.c, 2189
> Sep 15 15:34:58 rac2 kernel: (4623) ERROR: Access denied while opening
> file, Linux/ocfsmain.c, 2189
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Common/ocfsgencreate.c, 1605
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Common/ocfsgencreate.c, 1794
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Linux/ocfsmain.c, 1942
> Sep 12 12:35:16 rac2 kernel: (2786) ERROR: status = -12,
> Linux/ocfsmain.c, 2266
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgendirnode.c, 1379
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgendirnode.c, 1379
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgentrans.c, 396
> Sep 18 13:35:30 rac1 kernel: (18856) ERROR: status = -12,
> Common/ocfsgentrans.c, 396
> Before the kernel panic, I can find heaps of them. Normally alert.log
> shows that DB is trying to write something like log switch.
>
> I made a TAR with Oracle Support. First response is upgrade to latest
> version of OCFS, which is 1.0.9. I upgrade it, but it doesn't fix the
> problem. The latest response is
> "You may need to collect some more information taking assistance from
> the OS vendor.We need to get the stack when the kernel paniced
> indicating that the kernel panic after upgrading to OCFS 1.0.9 is
> caused by OCFS and is not for any other OS reason. We need information
> like stack and register values for progressing this as OCFS issue. You
> can dump the kernel after the panic and pass it on the OS vendor from
> which they can collect the above mentioned information and pass it on
> to us."
>
> I'm a DBA with limited knowledge of SA. Redhat also provides very
> limit support. I'm not sure what is the next step I can do from here.
>
> Any assistance would be greatly appreciated.
>
> Regards,
> Bin
>
> BTW, I first accidently posted it in comp.databases.oracle on Sunday,
> then I reposted it in comp.databases.oracle.server yesterday but still
> cannot find it today. So I post it again.
Received on Tue Sep 23 2003 - 19:41:40 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US