Re: Media recovery when using an old controlfile

From: Richard Foote <richard.foote_at_bigpond.com>
Date: Wed, 25 Jun 2003 23:33:28 +1000
Message-ID: <WdhKa.1167$p8.43445@newsfeeds.bigpond.com>

"Jurjen Oskam" <joskam_at_quadpro.stupendous.org> wrote in message news:slrnbfio00.ak2.joskam_at_calvin.quadpro.stupendous.org...
> Hi everybody,
>
> I'm experimenting with several options regarding the backup and recovery
of an
> Oracle 8.1.7 database.
>
> On the production machine, I first created a copy of the current
controlfile.
> After some seconds, I did the equivalent of a SHUTDOWN ABORT.[1]
>
> Then, I copied *all* the files over to another machine, along with the
copy of
> the controlfile. I changed the control_files parameter on the backup
machine
> to the copy of the controlfile.
>
> If I understand correctly, this results in (on the second machine)
datafiles
> that need media recovery, with a not-current controlfile.
>
> The situation is that I cannot start this database. First, I did a STARTUP
> MOUNT. Then, I did RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL.
> This results in the following:
>
> ORA-00279: change 1702722594 generated at 06/25/2003 08:22:23 needed for
thread 1
> ORA-00289: suggestion :

/oracle/app/oracle/admin/prod1/arch/arch1_49248.rdo
> ORA-00280: change 1702722594 for thread 1 is in sequence #49248
> Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
> CANCEL
> ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error
below
> ORA-01194: file 1 needs more recovery to be consistent
> ORA-01110: data file 1: '/dbentry/prod1/div1_500_01'
>
>
> However, sequence #49248 was not generated yet at the time of the SHUTDOWN
> ABORT of the original database. And indeed, when I try to open the
database
> with RESETLOGS, I get the error I was warned about above.
>
> Is it possible to do media recovery with an older version of a
controlfile?
> And if it's possible, is it sensible? :-)
>
>
>
> Right after the above failure to start the database with the backup
> controlfile, I changed the control_files parameter to the controlfile that
> was current at the time of the SHUTDOWN ABORT. With this controlfile,
> I was able to do a STARTUP without problems at all.
>
>
>
> [1] I didn't really SHUTDOWN ABORT the production database. What I did
> was atomically make a copy of the disks the database is on, and this
> copy was presented to the second host.
> --

Hi Jurjen

OK, let's see if this makes sense ;)

When you copied the datafiles across on the fly, they're inconsistent as a checkpointed shutdown was not perform. Therefore, the headers of the files have various and differing SCNs littered among them. Assuming that the last completed checkpoint was performed during the previous log switch, any inconsistencies can be resolved within the current, online redo log. By that I mean that any changes to uncommited blocks written to disk (but recorded within the undo segments) and any committed blocks found within the buffer cache (but not written to disk) *must* all be found within this current online redo log. Therefore in the case of an instance failure, or a shutdown abort, or indeed your "inconsistent" copy of the datafile, Oracle will require this current online redo log in order to apply all the committed changes and clean out all the uncommited changes. That is recover the database.

One word of warning here though is that because you didn't (I assume) put the tablespaces into backup mode, your copy of the database doesn't quite equate to a shutdown abort scenario. This is because a block as it's being copied could have a portion of it copied but then have DB Writer come and overwrite the remaining portion of the block. When this remaining portion gets copied we now have an inconsistent (or "fractured" block) which Oracle will *not* be able to recover. Oracle will require a consistent copy of this block from which to recover, that which is recorded by Oracle in it's redo stream when a tablespace is in backup mode (one of the key reasons why doing so is so important for hot backups).

Assuming we don't have any fractured blocks let's move on. You now have a set of inconsistent datafiles and a old or previous copy of the control file. This control file is not going to be an issue for us, *providing* it still describes the physical details of the current database, which I assume it does.

When you quite correctly run your recovery statement, you have stated to Oracle that the controlfile is not current and hence can't be used to police the required SCN of the *current* database. In other words you wish to perform an incomplete recovery and recover the controlfile as well. Therefore Oracle investigates the headers of the datafiles and determines which is the oldest, non read-only datafile as described in the control file. This is the point (and the redo log sequence) from which we wish to recover the database. Assuming that the last completed checkpoint was performed during the previous log switch, this means it's the current online redo that Oracle requires. That is, log 49428 is the current online redo log and as such can not possibly be archived.

But you don't allow Oracle to perform any recovery, you simply issue the cancel command. That means that the data files remain inconsistent. There is currently a datafile that has not be written to before the SYSTEM datafile, has an SCN previous to that recorded in the SYSTEM datafile and hence is at a future point in time to that guaranteed to that in SYSTEM. Therefore Oracle is complaining that SYSTEM (datafile 1) must be further recovered before Oracle will allow you to open the database. Your incomplete recovery was successful (as such) but you must perform more recovery and ensure that *all* datafiles are consistent until Oracle is going to be happy with you.

To recover the database, you must therefore copy the current online redo log across and *completely* apply all changes associated with this redo log. Then all your datafiles *must* be consistent and when it now asks for the next redo (which Oracle will do BTW because the control file will not know it's reached the end, it's not the *current* redo log remember) you can safely issue the cancel command.

Generally you perform an incomplete recovery by restoring all the necessary datafiles and recovering forward to the required point in time (or redo log). In your scenario, because all the datafiles are kinda current, they can only become consistent once all the generated redo logs have been applied (importantly, including the current one).

Make sense ?

Richard Received on Wed Jun 25 2003 - 08:33:28 CDT