Re: question concerning redo logs and recovery

From: Spencer <spencerp_at_swbell.net>
Date: Sat, 19 May 2001 09:06:57 -0500
Message-ID: <x4vN6.1139$291.296532@nnrp2.sbc.net>

as sybrand mentions, you can set a parameter in initsid.ora, i believe it is

log_block_checksum = true

(check the documentation for 8.0.5, to make sure that this is the correct paramater syntax before you make a change to your initsid.ora and bounce the instance.)

this will add some overhead... the overall performance impact could be significant, depending the workload (i.e. the amount of redo being generated)

it does sound as if the online redo log files are being "duplexed" if each log group contains three members, each member on a separate disk.

as i understand it, when redo blocks are written to a "duplexed" redo log, each member is written to sequentially. that is, the i/o to the second member is not performed until the i/o to the first member has completed.

if you have only one member, and the log file is "mirrorred" (either by o/s or hardware) then oracle is only reading the redo block one time, and performing one i/o. the "mirroring" software/hardware will handle the physical i/o to each device.

so there is a difference between "duplexing" and "mirroring"... with "duplexing", oracle will be writing the log block multiple times. if the log file has only one member, then oracle has only one shot to get it right, and it's left to the "mirroring" software/hardware to faithfully replicate the block to multiple devices.

i find it highly unlikely that this is a problem with the disks, since all three of them show corruption of the same block. if the disks are on separate scsi bus controllers, then it is unlikely that all of the controllers would have the same problem with the same block.

so it sounds as if the redo block got corrupted sometime before it was written to the redo log files. and that sounds like a problem with the oracle software, the o/s, or with physical memory.

one final note: i believe (for Oracle 8.0.6) that when a COMMIT statement is issued, oracle will NOT return until the redo blocks have been written to the log files. it is likely that oracle may have already written the redo blocks by the time the COMMIT is issued, but you are guaranteed that the redo blocks have been written to the log files when the COMMIT returns. the COMMIT does NOT guarantee that data blocks have been written to the datafiles.

HTH "Sybrand Bakker" <postbus_at_sybrandb.demon.nl> wrote in message news:tgc5uk533kg9f2_at_beta-news.demon.nl...
> Your post shows *exactly* why you should mirror/duplex online redolog
files.
> It looks like you didn't do that.
>
>
> Further answers embedded.
>
> Hth,
>
> Sybrand Bakker, Oracle DBA
>
> "Petra Hein/Gerald Bauer" <Petra.Hein-Gerald.Bauer_at_t-online.de> wrote in
> message news:9e4vs3$65i$01$1_at_news.t-online.com...
> > Hi,
> >
> > I just experienced a crash of Oracle 8.0.5. Our AIX-Server was hanging
and
> > there was no way to shut down Oracle properly. The database is running
in
> > archive mode.
> > The consequences: all 3 members of the CURRENT redo log group were
> > corrupted. The corrupted members of the group were distributed across 3
> > different disks. The size of redo logs are 50MB. Trying to clear the
current
> > log file group was not possible, so I started recovery ... There was a
> > corrupted block inside EACH CURRENT redo log file, which could not be
read
> > by the recovery-process. The recovery-process reported that the
corrupted
> > block inside the current redo logs was from yesterday ...
> >
> > So I recovered the database until yesterday, but I lost half a day of
work
> >
> > Now my questions:
> >
> > Is there any chance to detect as early as possible if a redo log is
> > corrupted and which parameters do I have to specify in init.ora ?
> The only thing you can do is set log_block_checksum to true. Of course
this
> has an adverse effect on performance.
>
> Does it
> > help reducing the size of redo log files ?
>
> No definitely not, has nothing to do with it.
> It will however increase the chance of the checkpoint not complete
problem.
> With redo log size being smaller,
> > will a corrupted redo log be detected earlier ?
>
> Again has nothing to do with it.
> > Is the archiver-process able to detect a corrupted redo log file ?
>
> I don't think so, archiving is just a plain copy.
> > Does a log switch check the redo logs for consistency ?
> >
>
> Not sure about this
>
> > Can anyone explain to me, when exactely the redo log buffer is written
to
> > the current redo log file ?
>
>
> When 1/3 of log_buffer is dirty, or every 3 seconds, which ever occurs
> first.
> > Is the redo log buffer written to the current redo log file as soon as a
> > transaction is finished by issuing a commit ?
> >
> No, much more earlier. A commit does however force a checkpoint.
>
> > Anyway, I'm now going to reduce size of redo logs
>
> I would advise against that, it will have an adverse effect on
performance.
>
> and adjust parameters to
> > have checkpoints triggered more frequently ..
>
> Ditto
>
> but maybe someone has other
> > ideas which could prevent such scenario described in the first lines
above
>
>
>
> Buy better hardware. How come 3!! disks show corruption.
> > ...
> >
> > I currently can't see a possibility to prevent 100% a crash as described
> > above (except standby database or replicated database nodes) ...
hopefully
> > you never will experience this case ...
> >
> > Many questions, hopefully at least some answers ...
> > thanks in advance,
> >
> > Gerald
> >
> >
> >
>
>
>
Received on Sat May 19 2001 - 09:06:57 CDT