Online (hot) Backups - Gotcha!

From: Kent Palm <kpalm_at_doc.qualcomm.com>
Date: Fri, 5 Feb 1993 21:08:20 GMT
Message-ID: <kpalm.728946500_at_doc>


I lost one of my databases a few weeks and was unable to recover from my online backup (instead, I had you use my nightly export) because of a programming error introduced in my backup script.

Recovery from my online backup was going good. My o/s files restored and rolling forward was progressing without fail...until I started to roll forward my last offline redo log. That's when I got an unrecoverable error. It turns out that this redo file was a partial copy. ARCH must have been copying the file to tape just as DBWR was filling it up. This error went undetected in my previous backup and recovery tests because ARCH wasn't busy at the right time.

I believe the problem was due to the line

   FILES=`ls /backup/arch*.dbf`

which is documented in Oracle for UNIX, Technical Reference Guide, v.6.0 on page 3-16 (Hot Backup Example Script). This line does not take into account whether the file has been completely copied. Then the line

   rm -f $FILES

does you in (this line is okay provided you don't have a partial file in $FILES). But if you do, then you're hosed. The fix, which I have applied, is to take file size into consideration. By using

   FILES=`find /backup/arch*.dbf -size 10535424c -print`

the file size is taken into consideration and partial files are left alone (i.e., not prematurely copied and removed). You may need to escape the * (i.e., \*), and the size is dependant on the final size of your offline redo logs (which should be identical to your online redo logs).

So, for those of you running online backups, gotcha?

Kent Palm
619/597-5420 Received on Fri Feb 05 1993 - 22:08:20 CET

Original text of this message