Re: Long running backups - OK?
Date: Fri, 26 Jan 2018 09:38:32 -0700
Message-ID: <b99610ab-9a7b-514d-9184-660d10260f6a_at_gmail.com>
Glenn,
Any question about backups should really be converted into a question on 
restore and recovery, because backups don't matter, restore/recovery 
from those backups matters.
 
So, to your point, longer-running backups result in longer-running 
recoveries.  An inconsistent or "hot" backup copies a baseline image of 
datafiles, but must also capture all redo generated during that datafile 
backup so that a roll-forward recovery after restore can produce a 
consistent image.  If the datafile backups run longer, then in most 
environment this means more redo must be captured.
 
So, your question about whether longer-running backups matter really 
depends on whether your organization can tolerate longer-running recoveries?
 
To help determine why your backups are taking longer to an NFS mount, 
are the NFS clients (i.e. database servers) configured appropriately for 
NFS activity?  Specifically, assuming your database servers are Linux, 
have you adjusted the TCP kernel settings to increase memory buffers for 
the increased data traffic across the network?
 
Again, assuming you are on Linux, to determine if there is bottlenecking 
on memory buffers with the NFS client, please consider downloading the 
Python script nfsiostat.py HERE 
 
If Average Queue Time from nfsiostat.py shows as anything but an 
inconsequential component of total NFS time, then it might be useful to 
enlarge the TCP send and receive buffers, which by default are 
insufficient for the heavy volumes of network I/O resulting from NFS.
 
This article HERE 
 
This Delphix documentation HERE 
 
If Average Queue Time from nfsiostat.py still shows up as a substantial 
portion of total NFS time even after increasing the TCP send and receive 
buffers, then there may be another problem within the OS, and it would 
be worthwhile to open a support case with your vendor.
 
Average RTT Time covers a great deal of territory, encompassing the 
entire network as well as the performance of the NFS server itself. 
Diagnosing RTT involves gathering information on the latency and 
throughput of the network, the number of network hops, and whether there 
are intermediate devices that can increase latency and/or reduce 
throughput (i.e. firewalls, etc).  And diagnosing RTT also possibly 
diagnosing the performance of the NFS server and it's underlying storage.
 
I guess the message here is that tuning NFS involves understanding the 
components and rigorously diagnosing each step.  Obviously this email is 
long enough as it is, and I could go on for hours.
 
Hope this helps...
 
-Tim
 
On 1/26/18 07:53, Glenn Travis wrote:
<https://fossies.org/linux/nfs-utils/tools/nfs-iostat/nfs-iostat.py>. 
This script simply calls the "nfsiostat" command from the Linux project 
"nfs-utils", but it reformats the output to be more useful and 
intuitive.  Specifically, it categorizes total NFS time into "average 
queue time" and "average RTT time".  Total NFS time is the average 
elapsed time the application sees for the NFS call.  Average queue time 
is time spent queuing the NFS request internally within the NFS client 
host.  Average RTT time is time spent on the network round-trip;  this 
includes the time spent on the wire and the time spent on the NFS 
performing the underlying I/O.
<https://wwwx.cs.unc.edu/%7Esparkst/howto/network_tuning.php> provides 
decent explanation of the TCP kernel settings.
<https://docs.delphix.com/docs/system-administration/performance-tuning-configuration-and-analytics/target-host-os-and-database-configuration-options> 
provides some good recommendations for optimizing NFS clients on various 
OS platforms, such as Solaris, Linux, AIX, and HP-UX.
>
> Lively discussion among our team regarding backup run times.  We are 
> using RMAN and recently migrated from tape to disk (NFS mounted) based 
> backups.  The backups are taking 2-3 times longer (up to 5 times 
> longer when concurrent).  Throughput dropped from 200-300mb/sec to 
> 50-70mb/sec.  We are investigating the performance issues but the 
> discussion changed to ‘Does it really matter?’
>
> So I wanted to throw out these questions for your opinions.  If a 
> database backup is running and not adversely affecting system (and 
> user’s applications’ performance), does it really matter how long it runs?
>
> Are there any negatives to having an Oracle backup run for over x 
> hours? Say a 5 hour (or longer) backup on an active database?
> What are the ramifications of long-running Oracle database backups, if 
> any?
>
> Note we have dozens of databases over 1tb and run fulls weekly, 
> cumulatives (inc level 1) daily, archivelogs hourly.
>
> I just can’t deal with a backup running for a quarter of the day.  
> Seems to be a long window of exposure and vulnerability should a crash 
> occur.
>
> Thoughts?
>
> *Glenn Travis*
>
> DBA ▪ Database Services
>
> IT Enteprise Solutions
>
> SAS Institute
>
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Jan 26 2018 - 17:38:32 CET
