Re: rollback and redo questions

From: Doug Cowles <dcowles_at_bigfoot.com>
Date: Tue, 10 Nov 1998 17:47:34 -0500
Message-ID: <3648C285.52E811E2@bigfoot.com>

This was a very good posting on the redo/rollback/datafile issue. One additional question, what is the sequence of these events - are transactions send to the redo logs first, and then to the rollback and db datafiles simutaneously? Or, are they sent to rollback and redo simutanously, and then to the data files (pre-checkpoint), and post-checkpoint. Any other combinations?

Thomas Kyte wrote:

> A copy of this was sent to Doug Cowles <dcowles_at_bigfoot.com>
> (if that email address didn't require changing)
> On Wed, 04 Nov 1998 11:20:39 -0500, you wrote:
>
> >
> >
> >Thomas Kyte wrote:
> >
> >> A copy of this was sent to Doug Cowles <dcowles_at_bigfoot.com>
> >> (if that email address didn't require changing)
> >> On Tue, 03 Nov 1998 23:29:20 -0500, you wrote:
> >>
>
> >> its rollback (undo) headers. In theory, the above output indicates rollback
> >> segment contention however, without seeing the canned script we cannot really
> >> tell. For all we know, the canned script is:
> >>
> >> select 'rollback contention for undo header = 100%' from dual;
> >>
> >> we need to see the question that is being asked of the database -- not just the
> >> answer...
> >>
> >
> >The actual line of code is " select 3 so, 'Rollback conention for undo header
> > (round(max(decode(class, 'undo header', count, 0 )) /
> > sum(count)+0.000000000001,4))*100||'%'||
> > ' (Total requests = '||sum(count)||')'
> > from v$waitstat
> >
> >The acutal output from v$waitstat is
> >CLASS COUNT TIME
> >------------------ ---------- ----------
> >data block 0 0
> >sort block 0 0
> >save undo block 0 0
> >segment header 0 0
> >save undo header 0 0
> >free list 0 0
> >system undo header 0 0
> >system undo block 0 0
> >undo header 36 0
> >undo block 0 0
> >
> >So it's really taking 36 and dividing by 36 and giving me 100%. This can't be
> >right.
>
> Yes it is right. The query is saying "get all of the rollback wait events,
> compute the total number of wait events (36 in your case). for each wait event
> (of which there are 4) compute the percentage of waits CAUSED by this wait
> event."
>
> In your case, 100% of undo waits are caused by undo header waits -- thats what
> the query tells you. What it doesn't tell you is if thats bad. On an active
> system -- 36 waits is nothing.
>
> >
> >What's involved in recovering a rollback segment with UNDO information in it?If it
> >isn't committed, isn't it safe to ignore? Or is it a matter of taking the REDO
> >information and comparing it against the UNDO information.
> >
>
> No, we don't compare redo and undo. It isn't safe to ignore either.
>
> Lets say we are doing an update. We are going to hit 100 blocks. We have 2 log
> files. We are the only one logged in. Very simply case. Our redo logs can
> only hold the changes for 25 of our blocks. So, every 25 blocks we switch logs.
>
> We start the update. We modify blocks 1-25. We switch from redo log 1 to redo
> log 2. This fires a checkpoint. The 1-25 blocks are being written to the
> datafiles (the uncommitted AFTER images are being flushed). The UNDO
> information for these 25 blocks are being written to rollback datafiles
> (flushed) as well.
>
> We continue updating hitting blocks 26-50. When we hit block 50, we have
> filled redo log file 2 and need to reuse 1. IF the checkpoint is complete (if
> blocks 1-25 are in the datafiles on disk and if the rollback is safely on disk
> as well) we can safely REUSE the redo log file because we will not ever have to
> REDO the updates to blocks 1-25 (we may have to UNDO it but not REDO it for
> crash recovery). We can reuse that log file as soon as the checkpoint completes
> (and if running in archive log mode, the redo file has been archived).
>
> So, we continue. We start writing to blocks 51-75, logging redo to redo file 1
> agian... The system crashes somewhere in the middle (power failure).
>
> When the system recovers -- the 2 online redo log files (redo log file 2 and
> then redo log file 1) will be used to recover the database. We will roll
> forward the changes from redo log file 2 (we were in the middle of checkpointing
> the data blocks that have redo in file 2, we need to REDO those changes). We
> will then apply whatever part of redo file 1 is relevant (upto the crash). This
> has also rolled forward UNDO blocks. We safely had UNDO blocks for blocks 1-25
> (from the first checkpoint) -- we now also have the UNDO blocks for 26-50
> (rolling forward redo file 2 did that) and some number of UNDO blocks for 51-75
> depending on how far we got before the crash. The system now realises that this
> transaction needs to be rolled back and uses the UNDO information to rollback,
> we have all of the UNDO blocks for every modified block, we only had REDO for
> some of the blocks (only the blocks we needed it for, the checkpoint allowed us
> to release some of that redo as we would not need it anymore).
>
> so we roll forward and then roll back in a sense. Its much more complicated at
> a low level then this, but this is in fact the gist of what is happening.
>
> >
> [snip]
>
> >>
> >> Everytime a log switch occurrs -- a checkpoint is fired (among other times --
> >> other events may fire a checkpoint but a log switch ALWAYS does). If the data
> >> this checkpoint was flushing to disk *does not complete* by the time we want to
> >> reuse the log file that fired that checkpoint, you will get the infamous
> >> "checkpoint not complete -- cannot allocate new log" message in your alert file
> >> and the system will suspend temporarily -- waiting for that checkpoint to
> >> finish. The scenario you describe above is one the system will not allow itself
> >> to get into.
> >
> >What if the system goes down while it's suspended?
>
> see above -- no problem. We don't reuse the redo until the checkpoint is
> complete, if the checkpoint is partial we still have the redo and can finish the
> checkpoint when the system recovers. Same process applies. The more general
> question in "what if the system goes down while it's in a checkpoint" (cause
> thats what is happening during a suspend, the checkpoint is finishing) and the
> example above assumes that a checkpoint could be in progress at the time of the
> system failure. We were checkpointing the blocks who had redo in log file 2 at
> the time of the crash (maybe -- maybe the checkpoint had finished -- its not
> relevant)
>
>
> Thomas Kyte
> tkyte_at_us.oracle.com
> Oracle Government
> Herndon VA
>
> --
> http://govt.us.oracle.com/ -- downloadable utilities
>
> ----------------------------------------------------------------------------
> Opinions are mine and do not necessarily reflect those of Oracle Corporation
>
> Anti-Anti Spam Msg: if you want an answer emailed to you,
> you have to make it easy to get email to you. Any bounced
> email will be treated the same way i treat SPAM-- I delete it.
Received on Tue Nov 10 1998 - 16:47:34 CST