Re: Tuning RMAN backup and recovery

From: Mark Brinsmead <pythianbrinsmead_at_gmail.com>
Date: Sun, 25 Nov 2007 08:58:17 -0700
Message-ID: <cf3341710711250758n7e512b7en547e41495b23c4b1@mail.gmail.com>

I may be joining this thread a little late, but oh well. Perhaps I can still add something to the discussion.

Just to summarize Don's situation:

Don is using RMAN to backup a database of about 860GB. The backups take more than 10 hours; less than 86GB/hr or 2.5 MB/s.

The backup is written to disk in /rman, a Veritas filesystem.

The RMAN backup uses 4 concurrent threads, with compression.

Don is unsure of the underlying disk configuration (RAID-1 vs. RAID-5, how many spindles, etc.) but is reasonably sure that /rman shares physical spindles with the database.

Don's "sar" statistics show that during the backup, the system is completely "busy", spending about 30% of its time in CPU, and 70% waiting on I/O.

Okay, so it looks pretty clear that these backups are I/O bound. It is also highly likely from what we have been told that there is substantial I/O contention. There are four concurrent backup threads reading from and writing to the same set of disks. This might also be aggravated by the cost of software-based RAID-5, but we do not actually *know* whether this this the case.

With 10g, RMAN compression can be either a blessing or a curse. In this case, where we are probably (badly) I/O bound, so the compression is * probably* beneficial. I think Don has done tests to confirm that, but I'm not certain I have seen that in this thread.

Based on what we have seen, I would think that the very best (or at least, * first*) "optimization" we can apply here is to separate the back storage from the database storage, on separate sets of spindles. Do not use RAID-5 for the /rman filesystem, except *maybe* with high end hardware-supported RAID-5 where sequential writes are recognised and optimised.

Don already plans to re-arrange the /rman storage. This should be done sooner rather than later, I think.

(Note: there are better reasons for rearranging this storage configuration than just performance. In the event of a storage failure, there is a significant risk of losing *both* the database *and* the backups. That would be a "bad thing (tm)".)

While I/O contention remains the main limiting factor for backup performance, RMAN compression is probably going to be a net benefit; the few disk blocks *written* by the backup, the fewer the counter-productive disk seeks; this leads to less contention and faster throughput.

There is, however, a second potential source of I/O contention -- the parallelism of the backup. In cases where backup parallelism is not well matched to the storage configuration, additional parallelism *harms*throughput.

Don, have you tried your backups with *fewer* parallel threads? This could be a tough thing to balance, but you may find that at least until you separate the backup and database storage, the reduced I/O contention might actually allow you to do your backups faster...

Back in the 90's, a typcial CPU could "gzip" data with a throughput of around 1.0 MB/s. Current CPUs can do much better. But your backup threads (unless I have botched my arithmetic) are averaging only somewhere around 0.6 MB/s. Ignoring RMAN for the moment, how fast can you gzip a 1 GB file? Until your backups are achieving *at least* four times that rate, you can probably assume they are I/O bound.

Anyway, these are a few thoughts on your situation; I hope they are not too random or disjointed. I hope even more that they are helpful. :-)

I think someone earlier in this thread asked about methods to optimized disk-based backups. Aside from the observations offered above, I have only come across one *really* reliable way of doing this -- buy a tape drive! :-) There are very affordable tape drives out there that are capable of sustaining throughputs well in excess of 100MB/s. That's 360 GB/hr. In this particular situation, a $5000 tape drive *could *completely transform your backups. Your only challenge then will be to find a way to keep the tape drive "fed" -- it is common for tape-based backups to suffer performance-wise when data cannot be delivered as fast at the tape drive can take it.

But that is a different discussion, perhaps for a different day...

On Nov 16, 2007 3:22 PM, Don Seiler <don_at_seiler.us> wrote:

> Here's the "sar -u" output from Saturday night and Sunday morning of
> this past weekend when the level 0 database backup was running. I'm
> not sure if you're interested in the -d output, or if you'd rather see
> iostat output.
>
> root_at_foo:/var/log/sa # sar -u -f sa10 -s 22:30:00 -i 900
> Linux 2.6.9-55.0.6.ELsmp (foo.bar.com) 11/10/2007
>
> 10:30:01 PM CPU %user %nice %system %iowait %idle
> 10:45:01 PM all 29.56 0.00 0.90 0.16 69.37
> 11:00:01 PM all 28.41 0.00 0.80 0.11 70.68
> 11:15:01 PM all 29.75 0.00 0.89 0.11 69.25
> 11:30:01 PM all 29.04 0.00 0.87 0.10 69.98
> 11:45:01 PM all 31.25 0.00 0.95 0.10 67.71
> Average: all 29.60 0.00 0.88 0.12 69.40
>
> root_at_foo:/var/log/sa # sar -u -f sa11 -e 06:00:00 -i 900
> Linux 2.6.9-55.0.6.ELsmp (foo.bar.com) 11/11/2007
>
> 12:00:01 AM CPU %user %nice %system %iowait %idle
> 12:15:01 AM all 29.38 0.00 0.99 0.12 69.52
> 12:30:01 AM all 29.57 0.00 0.99 0.24 69.20
> 12:45:01 AM all 27.11 0.00 3.28 5.73 63.88
> 01:00:01 AM all 33.61 0.00 3.82 5.01 57.55
> 01:15:01 AM all 31.57 0.00 3.49 5.60 59.35
> 01:30:01 AM all 27.54 0.00 2.50 4.16 65.80
> 01:45:02 AM all 25.33 0.00 0.95 0.14 73.59
> 02:00:01 AM all 24.30 0.00 0.91 0.12 74.67
> 02:15:01 AM all 25.23 0.00 0.91 0.11 73.75
> 02:30:01 AM all 25.19 0.00 0.94 0.13 73.74
> 02:45:01 AM all 25.77 0.00 2.77 4.45 67.01
> 03:00:01 AM all 26.14 0.00 3.17 5.82 64.87
> 03:15:01 AM all 25.99 0.00 1.84 2.45 69.72
> 03:30:01 AM all 25.67 0.00 0.97 0.13 73.23
> 03:45:01 AM all 24.40 0.00 0.97 0.12 74.51
> 04:00:01 AM all 25.76 0.00 0.97 0.13 73.14
> 04:15:01 AM all 31.83 0.01 1.22 0.49 66.44
> 04:30:01 AM all 27.24 0.00 1.70 0.24 70.82
> 04:45:01 AM all 26.65 0.00 2.59 4.89 65.87
> 05:00:01 AM all 27.05 0.00 3.14 5.93 63.88
> 05:15:01 AM all 26.45 0.00 2.94 5.45 65.16
> 05:30:01 AM all 25.99 0.00 1.05 0.13 72.83
> 05:45:02 AM all 23.22 0.00 0.95 0.13 75.70
> Average: all 27.00 0.00 1.87 2.25 68.88
>
>
> --
> Don Seiler
> http://seilerwerks.wordpress.com
> ultimate: http://www.mufc.us
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

-- 
Cheers,
-- Mark Brinsmead
  Senior DBA,
  The Pythian Group
  http://www.pythian.com/blogs

--
http://www.freelists.org/webpage/oracle-l

Received on Sun Nov 25 2007 - 09:58:17 CST