Re: benefits of multiple switch logfile calls?

From: Howard J. Rogers <hjr_at_dizwell.com>
Date: Sat, 6 Dec 2003 06:53:20 +1100
Message-ID: <3fd0e231$0$13984$afc38c87@news.optusnet.com.au>

"lsattle" <lsattle_at_yahoo.com> wrote in message news:95898986.0312050723.1edf17af_at_posting.google.com...
> Please let me address the criticsm to my prior post.

> >
> > Restoring a controlfile unnecessarily is a crazy thing to do, since it
> > requires the use of the 'until cancel' syntax, which itself requires
that
> > the database then be opened with a resetlogs, which thus means no prior
> > backups or archives are usable without severe hoop-jumping. It is, in
short,
> > an incredibly expensive way of doing recovery. And almost certainly is
> > totally unnecessary.
>
> Let me say one overriding statement. My goal is to evade the horrible
> oracle error "further media recovery needed" (or something like that).
> I have tested a lot of recoveries and seen too many of these errors.

I never have. And it should never happen if everything is being done right.

> The reason I said "our recovery command is" is because we test our
> recovery plans at a disaster recovey site and that is what we use
> there. I'm talking many databases ranging up to 1TB. I'm not
> concerned with having to "open the db with resetlogs" at a d/r
> recovery. Let's talk the loss of a db file later.
>
> I have learned that restoring at d/r and cloning are similar
> processes. When I clone I do most the same things as a full restore
> along with renamimg the files (by rebuilding the controlfile), etc. I
> use the same "recover database using backup controlfile until cancel"
> to clone. Note A clone can be built similar to a hot backup, but with
> copy commands disk to disk instead of backing up to tape.

I haven't really worked out what you are talking about here. But if you're talking cloning, then clone. If you're talking backup and recovery, then let's talk about that, and do it right.

If you are talking cloning, it is in any case far better to use the trace file backup of the control file, which does a bit of free database recovery for you, too. There's seldom, if ever, a need to use a binary backup of the controlfile (RMAN users excepted).

> >
> > It also explains why the original poster complains about the recovery
> > process using redo from a time after the backup completed: because the
> > restored controlfile has no idea when to stop recovering, because it is
> > itself out of date.
> >
> > A one-size-fits-all recovery strategy is feeble DBA-ing. No offence.
>
> What I believe you are saying above is that the original poster got
> the dreaded
> "further media recovery needed" message. Please remember this is
> exactly what I am trying to avoid.

It is quite possible that I've done more recoveries than you've had hot dinners, and I've never gotten that message once. But if you remove the controlfile, and replace it with a binary backup of the controlfile, then of course you can expect that mesasge.

>Most of the details in my plans
> that differ from oracle's original writeups on how to do this are
> there to patch holes I found in their way.

Well, you'd have to be a little bit more specific about all these holes that exist if you want that statement to carry weight. The 'file 1 requires more redo to be consistent' is precisely what happens when you attempt an incomplete media recovery (which you seem to be doing) but fail to restore all datafiles. Or you restore a binary version of the controlfile that doesn't actually describe the database being recovered (such as datafiles off line in the controlfile, but not in fact in the database etc etc).

Recovery scenarios are describable, and testable. And any such 'holes' could be easily described and proposed for testing by you.

> > > In fact what we do is not restore the online redo logs at all, we let
> > > the database rebuild them during the recovery. I will explain why.
> > >
> > > We do backups similar to you, but use netbackup to copy the files to
> > > tape. This copy can take from 10 minutes to 6 hours depending on the
> > > size of the db. If we let the system backup the redo logs, who knows
> > > where in the time frame they will get backed up. If we back them up
> > > after the "end backup" they will contain advanced data.
> >
> > You can never, ever backup online redo logs hot. Period. It's not a
question
> > of "who knows when in time they will get backed up", but simply that you
can
> > never copy anything hot in Oracle without the resulting output files
being
> > internally inconsistent and all over the place -and hence unusable.
Applying
> > redo, however, makes a hot-copied datafile consistent again. And the
'backup
> > controlfile' command is cunningly written by Oracle to produce a
> > read-consistent snapshot of the controlfile. So for data files and
control
> > files, Oracle has provided a mechanism that makes an inconsistent hot
copy
> > usable. There is no such mechanism for the redo logs themselves.
Therefore,
> > they can't be copied. Simple as that.
>
> We fully agree on online redo logs. This is where I was shooting
> myself in the foot for a long long time. If my memory is right,
> Oracle's original doc on this never addressed online redo logs.
> Yeseterday a couple of us at here looked up some more recent doc and
> it now says to NOT backup the online redo logs. We still back them up
> but never use them in the recovery. (we restore them and rename
> them). Don't be too critical here. With a product like netbackup it
> is much easier to backup / restore everything than trying to maintain
> that kind of detail in the netbackup definitions. They can be a
> maintenance headache. It's easier to just rename and/or delete the
> restored files at d/r.

Sorry, but I fundamentally disagree. You've got data, money and possible legal liability bills resting on one's ability to not lose data. That means knowing how to recovery correctly. And it means knowing how to backup correctly. And if I was auditiing any company that included online redo logs in their hot backups, depsite all their protests and assurances that they really know what they're doing, I'd mark it as a high-risk item that needs sorting. If netbackup has no way of avoiding those logs being included, it's a crap product and needs dumping. If it *does* have a way of avoiding the backup, but the users just haven't bothered to configure it to do so, then the product is at fault for making it too hard, or the users are at fault for not caring enough.

> > Before worrying about the order of steps, I'd take a serious look at
your
> > entire understanding, or lack thereof, of backup and recovery
procedures.
> > Yours are a mess, frankly.
>
> We have good reason to do both versions of the controlfile backup. We
> use version 1 to do d/r recoveries.

Why?

>We do version 2 to have a
> readable outside the database clone ready copy of the control file.
> It can come in handy at d/r or can be usable who knows when. It's a
> nice security blanket.

I wouldn't lose sleep over taking two copies of the controlfile, when all is said and done. But there is seldom a need for the binary version of the backup. Anything it can do, the tracefile version can do equally well, and usually rather better. Rman users excepted, as ever.

> >I think we
> > > should get the controlfile backup first, and then get the log switch.
> > > Note we do restore the backup of the control file before the recovery
> > > because again who knows where in the backup process the controlfile
> > > was backed up.
> >
> > Oh dear.
>
> I researched this after my post yesterday. The host sleep 10 allows
> the redo log to get copied to the archive area. However it also
> allows any checkpoint / SCN type changes to occur in the 10 second
> time period that makes the controlfile backup out of date and more
> advanced than the archived redo logs. That's why I questioned our
> procedures at the time of my post yesterday. This 10 second time
> frame could technically speaking caused a dreaded "further media
> recovery required".

It really has little or nothing to do with it. Whenever you copy the control file, it will be out of date. You could wait 10 hours or 10 nanoseconds. It makes no difference. The second you ask Oracle to back it up to a binary file, a read-consistent snapshot of the thing is taken, and by definition, a snapshot will be out of date the moment it's been snapped.

> Let me explain how. If 1/2 of a redo log got used during that 10
> seconds along with a checkpoint and the backup controlfile knew of the
> advanced SCN numbers then the use of that controlfile would have
> oracle asking for redo that was beyond the backup. If no more
> archiving occured by the time the archive directory got copied to tape
> then the backup of the redo would not be advanced enough to avoid
> "further media recovery required"

Incorrect, I'm afraid. A complete recovery takes the SCN of the restored datafile as its starting point, and the end of redo marker as its ending point. But if you throw a binary version of the control file in there, then all bets are off.

> My research after my post yesterday shed some light on why we won't
> have the issue. (You alluded to this answer earlier). When we say
> "using backup controlfile" I believe oracle ignores what's in the
> controlfile SCN wise and bases the recovery on the file headers.

Yes, it does. These things don't have to be a matter of belief. Your use of the binary version of the backup control file is the only reason why you are getting the 'file 1 needs more recovery to be consistent' error message.

>If
> this is the case our recovery is solid.

Well, we'll have to agree to disagree on that. Any recovery scenario that's as messy as yours is not what I'd call solid. It appears not to be based on any great understanding of how Oracle does the deed, but merely on particular scenario testing. That's not solid, that's "let's pray the scenario never changes, otherwise we'll have to rethink the whole thing".

> As stated earlier my mentality is a d/r recovery.

Which still doesn't demand or require the use of the binary version of the controlfile.

>For partial
> recoveries of lost files, etc your logic is right. We use what you've
> said above for those circumstances.
>
>
>
> Thanks for your analysis. However I have to stand by our tested out
> techniques. We have been doing at least 1 d/r recovery for every one
> of our production databases in recent years. We have had some file
> recoveries in house to deal with also. I had problems doing "hot
> copy" clones at times that really made me dig and get our procedures
> to be solid. We have about around 5 dba's that work in this area and
> probably 10 unix sysadmins.

Fine. Understand that this is a public forum, so others will read your posts. Therefore, there's a responsibility (I think) on all of us to post things of general applicability, not procedures which happen to work well for a particular scenario you are working on. If it works for you, of course stick with it. But we should all be careful to draw a distinction between something that 'works' and something that's best practice. What you have described may well work for you, but as an example of best practice, based on sound theoretiucal understanding of Oracle's recovery mechanism, it doesn't stand up too well. I challenge it, therefore, not to have a go at you, but to try and put a marker down so that others may know this is not what they should be doing, that it involves a lot of unnecessary work and caveats, and that there are better ways of doing things. That's all.

Regards
HJR

-- 
------------------------------------
Oracle insights at www.dizwell.com
------------------------------------

Received on Fri Dec 05 2003 - 13:53:20 CST