Re: Does an ALTER SYSTEM CHKPOINT write all dirty buffers to datafiles ?

From: Joel Garry <joel-garry_at_home.com>
Date: 15 Jul 2003 09:44:42 -0700
Message-ID: <91884734.0307150844.49ce862c@posting.google.com>

richard.foote_at_bigpond.com (Richard Foote) wrote in message news:<69f6c1c8.0307141921.45b1c3e_at_posting.google.com>...
> joel-garry_at_home.com (Joel Garry) wrote in message news:<91884734.0307141442.7b7ff68a_at_posting.google.com>...
> > spendius_at_muchomail.com (Spendius) wrote in message news:<aba30b75.0307140556.6a8baf93_at_posting.google.com>...
> > > Does this command really ascertain that *all* modified buffers
> > > will be written out to disk ??
> > >
> > > I'd just like to make sure (the doc. is not very clear about
> > > it, and I found different answers to this question -according to
> > > some it does write all dirty buffers, according to other people
> > > it only updates the SCN in the datafiles' headers...-).
> > >
> > > Thanks.
> > > Spendius
> >
> > If all it did was update the datafiles' headers, you'd never see
> > "Checkpoint not complete" errors. Google for that.
> >
> > The idea of checkpointing is to be sure redo logs are written. Redo
> > logs are the Achilles' Heel of Oracle. While nothing besides
> > performance is likely to be affected if you are getting "Checkpoint
> > not complete" errors, if your system goes down without finishing the
> > checkpoint, you've quite possibly set up to lose stuff when instance
> > recovery takes place on the next startup.
> >
> Hi Joel,
>
> Couple of points.
>
> The idea of checkpointing is to have a consistent point of time from
> which Oracle can perform instance recovery with the knowledge that all
> changes in the buffer cache prior to the checkpoint have been written
> to the OS. It has nothing to do with ensuring redo logs are written to
> except to ensure that the current contents of the redo buffer are
> flushed to the OS as well (to ensure all changes written to be DBWR
> are safely recorded in the redo logs).

Yeah, I should know better than to post about this stuff when I'm working on more immediate problems :-) 40 lashes with a wet page from tkytes book!

>
> If the system goes down without finishing the checkpoint it does not
> mean "you are set up to lose stuff". It simply means Oracle can't
> guarantee that all changes associated with a previous redo log have
> been written to disk therefore the previous redo log would be required
> for instance recovery as well as the current log. *No commited data
> can be lost* as Oracle guarantees all commited data to be written to
> the redo log files. If a redo log to be overwritten has not yet had
> it's corresponding checkpoint complete, Oracle simply waits until the
> checkpoint is complete to ensure that instance recovery can always be
> completed. Oracle prefers to hang the system rather than risk loss of
> data.

So you are saying Oracle waits during a shutdown abort? Besides that, I'm thinking of the situation where someone shuts down immediate, gets impatient with Oracle hanging the system, cancels operation, tries to abort, either succeeds or gets "shutdown in progress" so has to go around killing processes... not really Oracle's problem, but problems occur during unusual times. "Hmmmm, why am I getting performance problems now? Why are there so many retries on that disk... awheckthehardware'sgoingflakeyshutitdownnowbeforethingsgototallybonkers!"

"Records that Oracle appends to the redo log file after the change record that the checkpoint refers to are changes that Oracle has not yet written to disk. If a failure occurs, then only redo log records containing changes at SCNs higher than the checkpoint need to be replayed during recovery." - 8i tuning manual, perhaps oversimplifying.

There are also odd arguments to be made about people thinking things are or are not committed since there can be delays in a multi-tiered system, and parts of the system may die at strange times.

>
> Redo logs are only the " Achilles' Heel " if you lose all members of
> the current redo log group, which is quite a different scenario (and
> one which should never occur if the redo logs are configured
> appropriately).

A stronger argument can be made about modern storage devices lying to Oracle about whether a disk has actually been written to. Which, as you say, is a configuration issue.

--
@home.com is bogus.
http://www.signonsandiego.com/news/uniontrib/mon/business/news_mz1b14choney.html

Received on Tue Jul 15 2003 - 11:44:42 CDT