Checking that I can restore from RMAN backups (if I need to)

From: <tony_at_aberdour.demon.co.uk>
Date: 8 Oct 2006 13:35:04 -0700
Message-ID: <1160339704.547301.37250@m73g2000cwd.googlegroups.com>

My boss asked me if I could be sure that my backups were "adequate" to enable me to recover from them. Well, as I am a DBA it seems like a reasonable question doesn't it?

The problem is we have 2,300 databases (that's the supported ones at any rate. God only knows how many test systems there are but they aren't my problem).

It seems that RMAN is the flavour of the month for backups and it was suggested to me that I consider three main criteria:

There exists a backup of each datafile that is at most 'm' days old.
There is a full backup of each datafile that is at most 'n' days old.
The archived logs (on tape or on disk) form an unbroken sequence
(i.e. you can use them all for rolling forward in a recovery situation)
starting from the last backup.

To explain the difference betwen 'm' and 'n': If 'm' was 2 days and 'n' was 7 days, and a level 1 incremental backup was taken 1 day ago and a level 0 (full) backup was taken 6 days ago, we would be in good shape. If incrementals are not in use then a full backup must be at most 2 days old.

Now, I didn't quite fancy logging into 2,300 hosts every morning to check the backups so it seemed that a wee script was in order (I am not Scottish, but I live there and it seems to be rubbing off on me).

In short order, I worked out that the preferred solution was to use various RMAN CROSSCHECK commands and then to start querying the various controlfile V$ views.

OK:

CROSSCHECK BACKUP...
CROSSCHECK COPY....
CROSSCHECK ARCHIVELOG AFTER..... seemed a good start. Now what about those v$ views?

In 10gR2, there is a view V$BACKUP_DATAFILE_DETAILS which seems to make it reasonably easy to answer the first two questions (in prior releases V$BACKUP_DATAFILE is sufficient) and it seemed to me I was well on my way. Now for requirement three. After a short time I came up with this query:

select 'Archive log gaps between sequence ' || previous_log || ' and ' || current_log || ' for thread# ' || thread# message from
(select thread#,sequence# current_log, lag(sequence#) over (partition
by thread# order by sequence#) previous_log from v$archived_log where status ='A')
where previous_log < current_log-1

A nice, elegant query I thought to myself. Quite proud, I was, to have utilised an analytic function on a controlfile query!

Anyway, I soon realised that I had to make the query a little more complex to restrict the archivelogs under consideration to those strictly needed for recovery. I needed to determine the latest backup of each datafile and query its checkpoint_change#. I then had to take the minimum of these checkpoint_change#s and that should be the point from which I would need to roll forward. I then had to determine the starting sequence# for each thread based on that checkpoint_change#.

Undeterred, I soon I had my draft code in full screen mode admiring my handywork, and feeling quite pleased with myself I must say. Then I asked myself "What have I forgotten?".

Well, what about "READ ONLY" tablespaces? The checkpoint_change# in these datafiles may be ancient, but that does not mean we have any recoverability issues. But of course I DO need to check for cases where people have put tablespaces into READ ONLY mode and then immediately stopped backing them up. I needed some special treatment here.

OK, my simple SQL statement has now grown into quite a big piece of PL/SQL, and that is even before I try and deal with the 9i/10g differences. But hold on. What about RAC?

Sure, I have checked that there is an unborken sequence for each thread. But what about dependencies between threads? Come to think about it: What if I am missing ALL the archivelogs for a thread? Is that because that one instance has been down for a week, or is it because all the archivelogs have been lost?

At this point I started saying to myself that I must be on the wrong track. There must be an easier way. However, I have been unable to locate any discussion on this topic in any books or notes conferences. Surely EVERYBODY wants to know if they can recover their databases from backup don't they?

So the question is: "How are you supposed to do that?"

Any help would be appreciated. Received on Sun Oct 08 2006 - 15:35:04 CDT