Re: Online backup: Backup online redologs?

From: Charles Fisher <Charles.Fisher_at_alcoa.com>
Date: Wed, 02 May 2001 21:09:13 GMT
Message-ID: <Pine.GSO.4.31.0105021450240.3474-100000@unknown>

On Wed, 2 May 2001, Howard J. Rogers wrote:

> > I also shouldn't take hot backups of the redo logs, > but that seems
> > to work pretty well, doesn't it?

> Er, no. We may be at cross purposes here, in which case I haven't a
> clue what you are on about. But hot backups of redo logs are
> invariably useless for any purpose you care to mention. They are
> internally inconsistent, and there is nothing that can make them
> consistent.

In theory, you are right. However, in practice, you have essentially admitted that you are wrong.

The Oracle documentation does indeed claim that you should never take hot backups of the online redologs because of the danger of running over the 'end of backup' marker in a log replay.

However, in my original post, I did a successful 'RECOVER AUTOMATIC DATABASE' on a restore of a hot backup with hot copies of the redologs. You admit in your original reply that the poster might have "gotten away" with it, even if it was a "lousy technique." The fact that it worked at all demonstrates that it can work in practice.

As I see it, the 'end of backup' marker has been archived if an "ALTER SYSTEM ARCHIVE LOG CURRENT" has been run (after taking the last tablespace out of backup mode, but before using OS utilities to copy). Hopefully, the risk of this circumstance is minimal.

> > Either it is the same or it is different; you must choose.

> I don't have to do anything. My reply is accurate. Both sorts of files
> will be internally inconsistent. In your terms, that means they both have
> been introduced to 'the same sort of corruption' (inconsistency should be
> distinguished from true corruption, but practically they result in the same
> grief). But one you can fix, and the other you can't.

I'm not so sure.

As I understand it, redologs are written sequentially. Transactions with SCNs are written in a linear manner to the redologs, then applied to the datafiles.

If LGWR were to die for any reason, leaving an incomplete SCN recorded in the redologs, then I assume that upon the next instance startup, RECO (or whatever process is doing the recovery) is smart enough to recognize the incomplete record in the redologs, check that the datafiles do not record that SCN, then essentially do an fseek() in the active redolog thread and write over that space.

I would assume that this sort of thing happens quite regularly in "SHUTDOWN ABORT" situations.

This may be a rather simplistic view, but the same sort of conditions would exist with a hot copy of the redologs. If a transaction was in-flight, LGWR might have only partially recorded it, in which case it would not have been written to the tablespaces and could be removed with impunity.

p.s. My books do say that the RECO process is responsible for recovery,

     but when I run "top" during a "RECOVER AUTOMATIC DATABASE", I just
     see the DBWR/DBW0 (I'm still using some Oracle 7) consuming the cpu.
     I wish I knew what process was truly responsible for recovery
     operations.



> > In a cloning situation, *I only want to apply the redo

> > that is in the archived logs.* I could care less about

> > the online redo - really, absolutely, I don't want it,
> > and I don't want to have to worry about it.

> In a cloning situation, you are supposed to shut the thing down first.

That is tremendously inconvenient for me. Let me explain.

The real reason that I got interested in backups is that our 3rd party backup software failed (BMC obacktrack - awful stuff, at least the versions that ran with Oracle 7). I wrote a script to do the backup the "old-fashioned" way; I've attached it.

Another of my Oracle 7 systems needs a production-to-development copy every couple of months or so. I can only do cold backups on the weekend. I hate coming in on the weekends. I've found and demonstrated a cloning method using hot backups. It works every time I've tried. This was where I first learned that you must have a hot backup of the redologs when recovering an instance from hot backups when rebuilding the controlfile.

And then one of my Oracle 8.0.5 users on yet another system heard that I had a generic hotbackup script that could do instance recovery and complained about their twice-weekly cold backups, so the script gets used yet again. It's rather generic now, and quite polished if I do say so myself (rather spiffy, actually). I've tested recoveries with the hot redologs about 20 times so far; I've yet to have a problem.

I need to clone from time to time. I like to do it with hot backups. Here is a way that it can be done. If you have a better way, I'm all ears.

And yes, I am also including a standard controlfile backup in the orthodox and endorsed manner, just in case things go awry.

> > But, as I said, *YOU CANNOT CREATE A NEW SET OF CONTROLFILES
> > WITHOUT THE REDOLOGS*, which implies to me that cloning from
> > a set of hotbackups is not supported with a controlfile
> > trace. If this is so, then what is the point of hotbackups?

> Er, because they permit total or incomplete data recovery without having to
> shut down the database first. Mirroring is the protection for online redo
> logs, not backing up.

That's just daft. You should be able to back up everything.

> > It might as well be Microsoft Notepad with a SQL interface.
> > Oracle's tremendous flexibility with datafile manipulation is
> > negated by the lack of proper management of this single component.

> I rather think the problem is your expectations, not Oracle, at least in
> this particular regard. You are trying to conflate the issues of hot backup
> and cloning. They are two separate issues, and the techniques that apply in
> the one don't apply in the other.

You should be able to clone without shutting down. Practically, you can. Oracle should endorse a method for doing so. It is a GREAT flaw.

> > Whether I am cloning or backing up to tape, I want to hold
> > in my hand all required components to COMPLETELY restore
> > the database. 99% is not satisfactory.

> Define your terms. You start out wanting to clone a database, and now
> you're talking about wanting to do a complete restore. You can't have it
> both ways (or rather, you can, provided you do two different sorts of
> things).

I don't agree, but now you can see why. I have it both ways. I just want a few voices saying that it will work in general. Voices from Oracle support will be best, but hey, I'm not picky.

> If you really want to clone a database without the iffyness of having to
> shut it down first, why not investigate transportable tablespaces?

Oracle 7, dude.

> > > I don't understand why people are so evasive on this issue.
> > > I don't understand why certain people can't simply approach this issue
> > > with
> > > logic and clarity, instead of seeking to do the impossible by utterly
> > > illogical means.

> > Yes, at this point, it appears that cloning from hot backups
> > using a controlfile trace is impossible FROM AN ORACLE
> > SUPPORT PERSPECTIVE, nevermind that it is demonstrated to
> > work in practice in at least certain situations (actually, in
> > every situation that I've seen).

> Well, you beat me there. I've never seen it done hot in practice, and
> theory suggests that it isn't possible. There are, however, hidden
> parameters that will allow you to open any database with inconsistent redo
> logs. The particular one escapes me for now, but even if I could remember
> it, I wouldn't vouch for the results afterwards. And the end result would
> most definitely NOT be a "clone" in the strict sense of the word.

I didn't use any hidden parameters, incantations, virgin sacrifices, secret compacts or covenants, or any other form of divine intervention.

Do you want the screen captures?

> > The third Oracle support technician that I spoke with slyly
> > implied that this method would never cause trouble, so I've
> > heard answers all over the map on this question, most of them
> > wrong (as I previously demonstrated). Pardon me for my scepticisim.

> > Honestly, I don't understand why this process isn't better
> > addressed in the documentation and better understood by
> > support personnel. What is the purpose of a hotbackup if you
> > can't restore it?

> I really don't understand your problem! Hot backups are taken to ensure
> complete database recovery in the event of media failure, or incomplete
> recovery in the event of user error, without the need to shutdown the entire
> database whilst taking the backup. The resulting backup set is internally
> inconsistent, but the application of redo will make it consistent. Since
> you cannot apply redo to redo logs, hot copied redo logs will remain
> internally inconsistent. That's perfectly straightforward, and easy to live
> with, I should have thought.

As far as I know, redo has a more cohesive structure and linear order which makes it somewhat less delicate than the standard datafiles. More can be assumed about a redolog in the recovery process than about the other types of Oracle data.

But all of this verbage is just hand-waving; the proof is in the pudding.

> You want to clone a database? Then follow the well-documented cloning
> procedures (which involve a cold copy of all datafiles and redo logs).

Umm, I like my weekends, thanks.

> I may be wrong, and I don't have a database to hand on which to test this
> out, but since the commands to create or drop redo logs are all variations
> on a theme of 'alter database', I would have thought that they can be issued
> in the mount stage -which means you can take your binary version of the
> control file, created as you just described, get to the mount stage, add two
> new redo log groups, drop all reference to the existing groups, and alter
> database open resetlogs. The trace file method of course relies on the
> (consistent) presence of the existing logs, because the controlfile has to
> gets its SCN from somewhere. Using the binary file, you're just performing
> boring old incomplete recovery because of a gap in your redo stream.
> I wouldn't consider the resulting database a clone, though, because of that
> resetlogs. But your view might differ.

One of the benefits of my cloning approach is that a resetlogs does not seem to be required.

> >Of course, I would rather not do it this way -
> > the controlfile trace will allow me to make many changes to the
> > positioning of the datafiles in one go, including the system
> > tablespace. But assuming that I am willing to endure the lesser
> > quality, how are the redologs recreated?

> Alter database add logfile group 3 "blah/blah.rdo' size 1m (repeat for
> group 4) Alter database drop logfile group 1 (repeat for group 2)

I will give this a try tomorrow.

#!/usr/bin/ksh

# HOTBACKUP - Oracle hot backup script
#   This script will make a compressed hot disk backup of the Oracle data
#   files for the desired instance. It will also compress and move archivelogs
#   into the backup area (it creates an "arch" directory to hold these logs).
#   This approach is flexible, but can be very time-consuming.

# **Instructions for performing a full restore of this type of backup are # **found at the end of the script.

# Since this script contains the text of the SYS password, it should be set # with permissions of 500 in a private area.

# This script utilizes a number of Korn shell features; it will not work # properly with the Bourne or other limited shells.

#   This script utilizes Korn shell arrays. Some versions of the Korn shell
#   limit arrays to no more than 1024 elements; that is the maximum number
#   of Oracle data files that can be processed on those platforms.
#   Solaris and HP-UX have /usr/dt/bin/dtksh, which is a much newer Korn shell.

# The source for the Korn shell can now be downloaded from kornshell.com.

export ORACLE_SID=prdqry
export ORACLE_HOME=/pkg/prdqry/oracle

# The Oracle connect string. Under Oracle 8 and above, if you are the Oracle # user, you can simply "connect internal" without a password. CONNECT="connect sys/xxxxxx"

# List the fully qualified path names of your init(sid).ora and related files. INITORA="/pkg/prdqry/oracle/dbs/initprdqry.ora /pkg/prdqry/oracle/dbs/configprdqry.ora"

# Program to use for data compression
# Suggestions: "compress" (fast), "gzip -9q" (medium), "bzip2 -9" (slow) COMPRESS="/usr/contrib/bin/gzip -9q"

# Desired destination of backup files
DEST=/pkg/ntback/backup

# Fully qualified pathname for the sqlplus command SQLPLUS=/pkg/prdqry/oracle/bin/sqlplus

# Oracle archivelog directory
ARCH=/pkg/prdqry/oracle/arch

# Temporary holding area in the same filesystem as the archivelog directory ARCH2=/pkg/prdqry/oracle/arch2

# Directory containing output of 'ALTER DATABASE BACKUP CONTROLFILE TO TRACE' UDUMP=/pkg/prdqry/oracle/udump

# Suffix character for compressed files (must not be the same as arch logs). SUFFIX=zZ2

# -----------------------------------------------------------------------------

integer NFILES=0 INDEX=0

# Clean out any remnants of a previous backup. cd $DEST
rm -rf *

# Copy the init(sid).ora files.
cp $INITORA $DEST

# Start sqlplus as a Korn shell coprocess to generate a list of datafiles. $SQLPLUS /nolog |&

# Stuff sqlplus' input - it is easy to deadlock a coprocess; "quit" is critical. print -p "$CONNECT

	set sqlprompt ''
	set sqlnumber off
	set trimout on
	set head on
	set pagesize 2048
	set linesize 2048

	SELECT
		'+ '||RTRIM(tablespace_name), RTRIM(file_name)
	FROM
		sys.dba_data_files
	ORDER BY
		file_name;
	quit"

# Process the output of sqlplus.
while read -p line
do

# Any line not beginning with "+" is to be ignored. # Any line beginning with "+" is stripped of the "+" and appended to the array.

	case $line in
		+*)
			DFLIST[NFILES]=${line#+}

			NFILES=$((NFILES + 1))

			;;
	esac

done

# The array of tablespaces and datafiles is now constructed.

function backup_datafile
{

# This shell function will make a hot backup of a single Oracle data file.
# It takes two arguments - the Oracle tablespace name, and the fully-qualified
# path of the datafile.

	print "$CONNECT
			ALTER TABLESPACE $1 BEGIN BACKUP;
			host cp $2 $DEST
			ALTER TABLESPACE $1 END BACKUP;
			quit" \
		| $SQLPLUS /nolog > /dev/null 2>&1

# If enough disk space is available, the compress operation can be moved # outside of the function, or eliminated altogether.

$COMPRESS $DEST/$(basename $2)
}

while (( INDEX < NFILES ))
do

# Even though there is only one argument here, it becomes two in the function.

backup_datafile ${DFLIST[INDEX]}

INDEX=$((INDEX + 1)) done

# Force a write of the current redo.
print "$CONNECT

		ALTER SYSTEM ARCHIVE LOG CURRENT;
		quit" | \
	$SQLPLUS /nolog > /dev/null 2>&1


# Copy the online redo logs. Oracle says that you're not supposed to do this.
# The "Oracle8 Backup and Recovery Handbook" (ISBN 0-07-882389-7) has the
# following warning on page 94: "You should never take a backup of the online
# redo log files, as the online log file has the 'end of backup' marker and

# would cause corruptions if used during recovery." # This is the only way that I've made a complete restore work. $SQLPLUS /nolog |&

# The ugly SQL below selects only a single member from each group. print -p "$CONNECT

	set sqlprompt ''
	set sqlnumber off
	set trimout on
	set head on
	set pagesize 2048
	set linesize 2048

	SELECT
		'+ '|| RTRIM(x.member)
	FROM
		v\$logfile x
	WHERE
		x.member IN
		(
			SELECT
				member
			FROM
				v\$logfile
			WHERE
				group# = x.group# AND rownum < 2
		);
	quit"

# Process the output of sqlplus.
while read -p line
do

# Any line not beginning with "+" is to be ignored. # Any line beginning with "+" is a redolog file to be copied and compressed.

	case $line in
		+*)
			cp ${line#+} $DEST
			$COMPRESS $DEST/$(basename ${line#+})
			;;
	esac

done

# Now attend to the archive logs.
cd $ARCH

# Move the archivelogs to the temporary area. # Note: "mv * $ARCH2" may not work, since the argument list might be too large. for x in *
do

mv $x $ARCH2
done

# If Oracle was in the act of writing an archivelog, give it time to finish. sleep 30

cd $ARCH2

# Compress all the archivelogs. Do this with cron on a regular basis as well. for x in *[!$SUFFIX]
do

$COMPRESS $x
done

mkdir $DEST/arch

# Move the archivelogs to the destination area. for x in *
do

mv $x $DEST/arch
done

# Now attend to the controlfile. Make a backup copy and a trace. cd $UDUMP

# Locate and delete old outputs of 'ALTER DATABASE BACKUP CONTROLFILE TO TRACE'. CONTROL=$(grep -l 'CREATE CONTROLFILE' *.trc)

if [[ ! -z "$CONTROL" ]]
then

rm -f $CONTROL
fi

chown oracle:dba $DEST

print "$CONNECT

		ALTER DATABASE BACKUP CONTROLFILE TO '$DEST/CONTROL.BAK';
		ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
		quit"| \
	$SQLPLUS /nolog > /dev/null 2>&1

chown root:root $DEST

CONTROL=$(grep -l 'CREATE CONTROLFILE' *.trc)

mv $CONTROL $DEST

chmod -R 600 $DEST

chmod 700 $DEST $DEST/arch

# Restore Instructions

#  If you have simply lost or corrupted a data file, restore the backup data
#  file, plus all archive logs produced after the timestamp on the file.
#  With this material in place, you can run
#  "RECOVER AUTOMATIC DATAFILE '/path/to/datafile.dbf';"

#  If you need to recover the entire database, load the *exact same* Oracle
#  software binary image, restore the contents of the backup, then edit the
#  file named "ora*.trc". Remove everything except for the "CREATE CONTROLFILE",
#  modify the paths if necessary, then enter svrmgrl (or sqlplus on 8i+),
#  do a "startup nomount", then run the "CREATE CONTROLFILE". After a delay,

# do a "RECOVER AUTOMATIC DATABASE;" and after another delay, enter # 'ALTER DATABASE OPEN;' and your database should come online.

Charles J. Fisher - Consultant
Alcoa Davenport Works
(319) 459-2512 Received on Wed May 02 2001 - 16:09:13 CDT