RMAN and NetBackup Performance

From: Mark Strickland <strickland.mark_at_gmail.com>
Date: Tue, 6 Mar 2007 16:16:42 -0800
Message-ID: <90ad14210703061616r594a5ebfw36706a5c8c876eaa@mail.gmail.com>

Oracle 10.1.0.5 RAC on Solaris9
Symantec/Veritas NetBackup 5

Anyone out there very experienced with managing RMAN with NetBackup?

Starting the evening of February 19th, duration of the backups to tape of the Production Flash Recovery Area suddenly jumped from about 1/2 hour on average to 2-3 hours. These are level 1 incrementals. The actual backup time for each backupset is still 1-2 minutes as always, but there is a 5-7 minute delay in between. Nothing has changed in terms of number/size of backupsets and total size of the backups. We opened a ticket with Symantec and we were instructed to turn on verbose logging for the various NetBackup processes on the RMAN client database server and NetBackup master server. I've become obscenely intimate with verbose NetBackup logs over the last two weeks. So far, not getting very far with Symantec. Their one contribution has been to suggest that we explicitly set the format for the RMAN backups with a %t at the end. This apparently is supposed to improve performance of NetBackup catalog lookups. In the RMAN docs, it says that if the format statement is used, Oracle will not manage the Flash Recovery Area automatically. So, that idea's out. I don't want to manage the FRA manually. After poring over NetBackup logs, we've determined that:

The NetBackup image database for this particular RMAN client is quite large with about 42,000 image files totalling 47-GB. During each backup of an RMAN backupset, the image database is searched to see if a record for the RMAN backupset already exists. It starts with the most recent image file and works backward sequentially one-by-one through the 42,000 image files to the oldest image file (90 days ago, the retention period) even after it has already found the record for the backupset. That takes about 6 minutes, which is a long enough sleep for the NetBackup Scheduler to wake up and grab the opportunity to do a backup of the NetBackup catalog. Also, during this 6 minutes, the Media_Unmount_Delay has reached its default 180 second timeout, so NetBackup determines that the tape is no longer needed in the drive and ejects it. Finally, after the catalog search has come back and the same tape is re-mounted, 15 minutes have passed. Symantec had us disable automatic catalog backups and explicitly schedule them instead, which removed 6-8 minutes from the duration of the backup. What is left is the 6-7 minutes of searching the image database and remounting the tape.

I can find nothing that changed on February 19th in our environment. The morning backups were of normal duration and the evening backups were not and the backups have been slow ever since. This is especially annoying because in early February, I increased MAXSETSIZE to get more data files into each backupset and reduce the number of backupsets from 100 to 15 which decreased the backups to tape from 3-1/2 hours to 30 minutes. We were quite enjoying that improvement.

I've Googled and I've searched the Symantec site for clues. Nothing so far. Any ideas?

Regards,
Mark Strickland
Seattle, WA

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Mar 06 2007 - 18:16:42 CST