Re: benefits of multiple switch logfile calls?

From: lsattle <lsattle_at_yahoo.com>
Date: 5 Dec 2003 07:23:57 -0800
Message-ID: <95898986.0312050723.1edf17af@posting.google.com>

Please let me address the criticsm to my prior post.

"Howard J. Rogers" <hjr_at_dizwell.com> wrote in message news:<3fcf8f06$0$20308$afc38c87_at_news.optusnet.com.au>...
> "lsattle" <lsattle_at_yahoo.com> wrote in message
> news:95898986.0312040755.16c6f7ac_at_posting.google.com...
>
> >
> > I'm not sure of where you are mentally on all this. The switch log
> > files / archive log files, to my knowledge is intending to get any
> > current redo out of the online redo logs and into the archive redo
> > logs.
> >
> > I'm not completely clear on switching vs archiving. I would think
> > that just archiving would be enough.
>
> There's no difference. If you 'archivelog current', a log switch is induced
> anyway.

Thanks for the clairification here.

>
> >
> > What I am certain of is that while your tblspaces are in "begin
> > backup" and you are gziping or copy data files, transactions and / or
> > checkpoints are occuring in your system. After you do the "end
> > backup" it is wise to get all redo out to the archive directory.
> >
> > Then when you restore and do the recovery command (our recovery
> > command is "recover database using backup controlfile until cancel")
> > the recovery will only need archived redo logs.
>
> This is a concern. What do you mean "our recovery command is..."?? The
> recovery command that one issues should depend on the nature of the
> recovery, which itself should be dependent on the nature of the failure. And
> in particular, why on Earth so many people enjoy throwing the 'using backup
> controlfile' line in there, I have no idea. You only need that line when
> you've actually restored your controlfile, and you would never need to do
> that if you've (a) multiplexed your controlfiles as you should (b) done
> backup controlfile to trace as you should and (c) mirrored your controlfiles
> as you should.
>
> Restoring a controlfile unnecessarily is a crazy thing to do, since it
> requires the use of the 'until cancel' syntax, which itself requires that
> the database then be opened with a resetlogs, which thus means no prior
> backups or archives are usable without severe hoop-jumping. It is, in short,
> an incredibly expensive way of doing recovery. And almost certainly is
> totally unnecessary.

Let me say one overriding statement. My goal is to evade the horrible oracle error "further media recovery needed" (or something like that). I have tested a lot of recoveries and seen too many of these errors.

The reason I said "our recovery command is" is because we test our recovery plans at a disaster recovey site and that is what we use there. I'm talking many databases ranging up to 1TB. I'm not concerned with having to "open the db with resetlogs" at a d/r recovery. Let's talk the loss of a db file later.

I have learned that restoring at d/r and cloning are similar processes. When I clone I do most the same things as a full restore along with renamimg the files (by rebuilding the controlfile), etc. I use the same "recover database using backup controlfile until cancel" to clone. Note A clone can be built similar to a hot backup, but with copy commands disk to disk instead of backing up to tape.

>
> It also explains why the original poster complains about the recovery
> process using redo from a time after the backup completed: because the
> restored controlfile has no idea when to stop recovering, because it is
> itself out of date.
>
> A one-size-fits-all recovery strategy is feeble DBA-ing. No offence.

What I believe you are saying above is that the original poster got the dreaded
"further media recovery needed" message. Please remember this is exactly what I am trying to avoid. Most of the details in my plans that differ from oracle's original writeups on how to do this are there to patch holes I found in their way.

> >
> > In fact what we do is not restore the online redo logs at all, we let
> > the database rebuild them during the recovery. I will explain why.
> >
> > We do backups similar to you, but use netbackup to copy the files to
> > tape. This copy can take from 10 minutes to 6 hours depending on the
> > size of the db. If we let the system backup the redo logs, who knows
> > where in the time frame they will get backed up. If we back them up
> > after the "end backup" they will contain advanced data.
>
> You can never, ever backup online redo logs hot. Period. It's not a question
> of "who knows when in time they will get backed up", but simply that you can
> never copy anything hot in Oracle without the resulting output files being
> internally inconsistent and all over the place -and hence unusable. Applying
> redo, however, makes a hot-copied datafile consistent again. And the 'backup
> controlfile' command is cunningly written by Oracle to produce a
> read-consistent snapshot of the controlfile. So for data files and control
> files, Oracle has provided a mechanism that makes an inconsistent hot copy
> usable. There is no such mechanism for the redo logs themselves. Therefore,
> they can't be copied. Simple as that.

We fully agree on online redo logs. This is where I was shooting myself in the foot for a long long time. If my memory is right, Oracle's original doc on this never addressed online redo logs. Yeseterday a couple of us at here looked up some more recent doc and it now says to NOT backup the online redo logs. We still back them up but never use them in the recovery. (we restore them and rename them). Don't be too critical here. With a product like netbackup it is much easier to backup / restore everything than trying to maintain that kind of detail in the netbackup definitions. They can be a maintenance headache. It's easier to just rename and/or delete the restored files at d/r.

>
> >So we like to
> > control the whole situation by doing the "end backup" followed by:
> >
> > alter system checkpoint;
> > alter system switch logfile;
>
> You're controlling nothing here. You're merely duplicating expensive
> checkpoints, since one is issued at a log switch in any case.
>
> > host sleep 10
> > alter database backup controlfile to '$gOUT_DIR/cntrlfile';
> > alter database backup controlfile to trace;
>
> And the point of doing two controlfile backups is???
>
> > Note I now believe these last steps are out of order.
>
> Before worrying about the order of steps, I'd take a serious look at your
> entire understanding, or lack thereof, of backup and recovery procedures.
> Yours are a mess, frankly.

We have good reason to do both versions of the controlfile backup. We use version 1 to do d/r recoveries. We do version 2 to have a readable outside the database clone ready copy of the control file. It can come in handy at d/r or can be usable who knows when. It's a nice security blanket.

>
> >I think we
> > should get the controlfile backup first, and then get the log switch.
> > Note we do restore the backup of the control file before the recovery
> > because again who knows where in the backup process the controlfile
> > was backed up.
>
> Oh dear.

I researched this after my post yesterday. The host sleep 10 allows the redo log to get copied to the archive area. However it also allows any checkpoint / SCN type changes to occur in the 10 second time period that makes the controlfile backup out of date and more advanced than the archived redo logs. That's why I questioned our procedures at the time of my post yesterday. This 10 second time frame could technically speaking caused a dreaded "further media recovery required".

Let me explain how. If 1/2 of a redo log got used during that 10 seconds along with a checkpoint and the backup controlfile knew of the advanced SCN numbers then the use of that controlfile would have oracle asking for redo that was beyond the backup. If no more archiving occured by the time the archive directory got copied to tape then the backup of the redo would not be advanced enough to avoid "further media recovery required"

My research after my post yesterday shed some light on why we won't have the issue. (You alluded to this answer earlier). When we say "using backup controlfile" I believe oracle ignores what's in the controlfile SCN wise and bases the recovery on the file headers. If this is the case our recovery is solid.
>
> "We restore the controlfile because who knows when it was backed up". Why
> restore something that doesn't actually need restoring?? You lose a 2GB
> datafile, you're telling me you restore the entire set of datafiles and the
> controlfiles? Why???? Oracle backup and recovery doesn't need to work like
> this. You only need to restore that which is causing a problem, recover it,
> and Oracle's internal SCN timestamping functions take care of the rest.
> That's why Oracle invented 'recover datafile' and 'recover tablespace'
> commands as well as 'recover database'. The idea is that you analyze what
> has gone wrong with your database, and pick the appropriate command to do
> the necessary recovery work. Your approach would seem to be 'let's not
> analyze anything, but just restore the lot, and recover the lot'. When the
> problem is merely one corrupt block in one small data file that is not in
> any case mission-critical, that would be, er, an interesting approach.

As stated earlier my mentality is a d/r recovery. For partial recoveries of lost files, etc your logic is right. We use what you've said above for those circumstances.

>
> Regards
> HJR
Thanks for your analysis. However I have to stand by our tested out techniques. We have been doing at least 1 d/r recovery for every one of our production databases in recent years. We have had some file recoveries in house to deal with also. I had problems doing "hot copy" clones at times that really made me dig and get our procedures to be solid. We have about around 5 dba's that work in this area and probably 10 unix sysadmins.

Regards,

Lynn Sattler Received on Fri Dec 05 2003 - 09:23:57 CST