Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: kernel panic on Red Hat AS because of OCFS

Re: kernel panic on Red Hat AS because of OCFS

From: wangbin <wangbin_at_start.com.au>
Date: 25 Sep 2003 17:33:20 -0700
Message-ID: <2d15bd69.0309251633.15ce77c2@posting.google.com>

Hi
Thanks for all response.

I totally agree with Dusan. I didn't recommend it, but the manager was beaten by oracle marketing ...  

I check (note 225710.1) and 2.4.9-e.16smp is a supported kernel. And I reply http://otn.oracle.com/tech/linux/htdocs/linux_techsupp_faq.html to Oracle support, and the status of my TAR changes from waiting for customer to Awaiting Internal Response.

Upgrade is always the option. However, keeping the production system up with the release of linux kernel and ocfs is really painful. Firstly, we boot from EMC, which only supports a certain kernel version. Secondly, the new version can bring new bug. For example, Our backup solution is the following: 1.The production DB is using ocfs on mirrored shared disk. There is a set of disks(BCV), which can sync with production DB or split from production DB;

2.When start backup, sync BCV with production DB(data files) disk;
3.Alter all tablespace into hot backup mode;
4.Split BCV from production DB disk;
5.Alter all tablespace back to normal.
6.Do it on archive log as well.
7.Then we backup those data to tape, use them for recovery as well as
building test database.
To restore those backup disks to test RAC, the procedure is:
1.shutdown DB, gsd, oracm on both nodes of test rac;
2.umount all ocfs file system;
3.restore data from BCV to test RAC's disk;
4.mount ocfs file system on both nodes.

All procedures have been tested and work in the last five months. After upgrade to 1.0.9, yesterday I tried it. On node two of test rac, it works. However, on node one it keeps panic the box when you mount. After more than 20 times panic, we try something different. I use ocfs_uid_gen to regenerate guid, then use ocfstool to mount, it panic. However, when the system comes back, the problem disappears- I can mount them without any problem. I recog it is 1.0.9 new feature, looks like Bug 3134746, which will be fixed in ocfs1.0.9.7.

>3. Upgrade ocfs to 1.0.9-6.

BTW, how to upgrade ocfs?
In the readme, all you need is remove old packages, and install new ones. I use the following one, since I believe it is safer:

1. Backup files;
2. Remove old software;
3. Install new softeare;
4. Format those ocfs disks;
5. Restore files.

Once, I have two boxes with the exact same size disk and put the exact same files on them. One uses 1.0.8, the other uses 1.0.9. The result of df –k is different.

However, if I use this way to upgrade, it will have couple hours down time. Certainly, manager is not very happy about it.

>4. Ensure ocfs is listed in PRUNEFS in /etc/updatedb.conf.
 Yes
>5. The process crashed is find. What were you doing when it crashed?
Not sure. Could be any time. Recently twice is like, I monitor alert.log, which send me errors since DB cannot do log switch, though there are a lot space in archived log directory. Then the box looks hang, and crash.

Thanks,
Bin Received on Thu Sep 25 2003 - 19:33:20 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US