RE: db corruption

From: Bobak, Mark <Mark.Bobak_at_il.proquest.com>
Date: Tue, 15 Aug 2006 13:17:35 -0400
Message-ID: <AA29A27627F842409E1D18FB19CDCF270922B14C@AABO-EXCHANGE02.bos.il.pqe>

Yep, I'm saying that when I see the error, it always occurs on the last block of the datafile, and that it's precisely one block that's zeroed out, to the byte. In 8.1.7.4, there was the nasty side effect, due to a bug, that Rman would try to read the disk, error out, read the mirror, error out, and then bounce back to the disk, back to the mirror, and get caught in an endless loop, and spew *lots* of errors to the alert.log. We opened a TAR and got a patch for that, and that allowed us to back up the problem file(s). At that point, doing a restore re-formatted the problem block, and the problem disappeared.

Just recently, I saw the same occurance again in 9.2.0.6, this time on just a single datafile, and 9.2.0.6 fortunately doesn't have the bug that causes Rman to get stuck in the endless loop. I'm convinced this is some obscure bug, but without a neat, tidy, reproducible test case, I hate the idea of even thinking about opening an SR.

This reminds me of the days when we were on Dynix/ptx and found a really obscure bug in the kernel's filesystem layer that caused archive log corruption. That was a fun one....days on end on the phone w/ Oracle kernel developers and Dynix/ptx kernel engineers. Oh yeah, that was fun.....

-Mark

--
Mark J. Bobak
Senior Oracle Architect
ProQuest Information & Learning

Ours is the age that is proud of machines that can think and suspicious
of men who try to.  --H. Mumford Jones, 1892-1980


-----Original Message-----
From: Kevin Closson [mailto:kevinc_at_polyserve.com] 
Sent: Tuesday, August 15, 2006 1:07 PM
To: Bobak, Mark; ORACLE-L
Subject: RE: db corruption




>>>Oracle 8.1.7.4 and 9.2.0.6, Solaris Sparc, raw volumes, served up 

>>>from an EMC DMX, via VxVM.




>>>I've had one database where this occurred once.  I've had another 

>>>where this happened to 60-70 datafiles.  Exact same type of 

>>>corruption, always the last block in the datafile.

>>>


Are you saying it is the last block precisely that is zeroed out? And
that is 100% of the block?  Given the config you've described, I think
only Oracle or VxVM could be to blame for that. Since the DMX is
virtualized by VxVM and the problem happened on 60-70 datafiles, I think
the odds are too astronomical that the DMX did it since those magic last
blocks are strewn all about.
--
http://www.freelists.org/webpage/oracle-l

Received on Tue Aug 15 2006 - 12:17:35 CDT