Re: kernel panic on Red Hat AS because of OCFS

From: snip3r <snip3r_at_nospam.com>
Date: Wed, 01 Oct 2003 13:40:32 -0700
Message-Id: <pan.2003.10.01.20.40.27.398492@nospam.com>

EMC supports e25/27.

ocfs_uid_gen needs to be run only once.
For more info, check out Note:249396.1 on metalink. The crash in the bug listed was due to running ocfs_uid_gen multiple times.

Upgrading to 1.0.9-6 involves rpm -Uvh <new modules> The disk format is the same.

On the archive volume, ensure all nodes are archiving to different dirs.

On Thu, 25 Sep 2003 17:33:20 -0700, wangbin wrote:

> Hi
> Thanks for all response.
>
> I totally agree with Dusan. I didn't recommend it, but the manager was
> beaten by oracle marketing ...
>
> I check (note 225710.1) and 2.4.9-e.16smp is a supported kernel. And I
> reply http://otn.oracle.com/tech/linux/htdocs/linux_techsupp_faq.html
> to Oracle support, and the status of my TAR changes from waiting for
> customer to Awaiting Internal Response.
> Upgrade is always the option. However, keeping the production system
> up with the release of linux kernel and ocfs is really painful.
> Firstly, we boot from EMC, which only supports a certain kernel
> version. Secondly, the new version can bring new bug.
> For example, Our backup solution is the following:
> 1.The production DB is using ocfs on mirrored shared disk. There is a
> set of disks(BCV), which can sync with production DB or split from
> production DB;
> 2.When start backup, sync BCV with production DB(data files) disk;
> 3.Alter all tablespace into hot backup mode;
> 4.Split BCV from production DB disk;
> 5.Alter all tablespace back to normal.
> 6.Do it on archive log as well.
> 7.Then we backup those data to tape, use them for recovery as well as
> building test database.
> To restore those backup disks to test RAC, the procedure is:
> 1.shutdown DB, gsd, oracm on both nodes of test rac;
> 2.umount all ocfs file system;
> 3.restore data from BCV to test RAC's disk;
> 4.mount ocfs file system on both nodes.
>
> All procedures have been tested and work in the last five months.
> After upgrade to 1.0.9, yesterday I tried it. On node two of test rac,
> it works. However, on node one it keeps panic the box when you mount.
> After more than 20 times panic, we try something different. I use
> ocfs_uid_gen to regenerate guid, then use ocfstool to mount, it panic.
> However, when the system comes back, the problem disappears- I can
> mount them without any problem. I recog it is 1.0.9 new feature, looks
> like Bug 3134746, which will be fixed in ocfs1.0.9.7.
>

>>3. Upgrade ocfs to 1.0.9-6.

> BTW, how to upgrade ocfs?
> In the readme, all you need is remove old packages, and install new
> ones. I use the following one, since I believe it is safer:
> 1. Backup files;
> 2. Remove old software;
> 3. Install new softeare;
> 4. Format those ocfs disks;
> 5. Restore files.
> Once, I have two boxes with the exact same size disk and put the exact
> same files on them. One uses 1.0.8, the other uses 1.0.9. The result
> of df –k is different.
>
> However, if I use this way to upgrade, it will have couple hours down
> time. Certainly, manager is not very happy about it.
>

>>4. Ensure ocfs is listed in PRUNEFS in /etc/updatedb.conf.

> Yes

>>5. The process crashed is find. What were you doing when it crashed?

> Not sure. Could be any time. Recently twice is like,
> I monitor alert.log, which send me errors since DB cannot do log
> switch, though there are a lot space in archived log directory. Then
> the box looks hang, and crash.
>
> Thanks,
> Bin
Received on Wed Oct 01 2003 - 15:40:32 CDT