RE: RTO Challenges

From: Matthew Parker <>
Date: Sat, 31 Mar 2018 21:57:44 -0700
Message-ID: <0cf001d3c975$f97fb850$ec7f28f0$>

As Niall stated earlier this is really a business problem/decision not a technical problem. The business decides what they require and it is you and the Infrastructure teams responsibility to provide details/costing to achieve the objective (which we will talk about in a second email), then the business can decide what they want to do and in some cases you will find that the business will make other choices.  

I had one recent client that the backups/snapshots were impacting their system which was a 7x24 utilized by human system, that the business opted for no backups/snapshots and if a failure happened they would rebuild from scratch on a new server. This specific system under certain Insurance compliance rules was required to keep all original incoming data (nicely partitioned by customer) so the business could reload any data stream that they wanted in order to prioritize the bringing of data back online to do work. (They didn’t have to wait for the full system to be restored back to the original state, but could piece meal the restore at their leisure.)  

Most of us of course don’t have many databases that meet that requirement, but when explained the problem and the costing the business will make a choice in the end and it may not be the choice you would have chosen.  

Each company has a cost benefit analysis as to the Risk the business is willing to incur versus the amount of money they are willing to expend to decrease or eliminate that risk. If you get to participate in those conversations you will be surprised sometimes at where the business will draw the line on cost versus taking the risk that seems surreal sometimes. That decision making normally goes to the C-Level in the company and you will find that say the CFO has a completely different opinion in Risk versus Cost than someone lower in the organization.          

Matthew Parker

Chief Technologist

Dimensional DBA

Oracle Gold Partner

425-891-7934 (cell)

D&B 047931344

CAGE 7J5S7 <>

 <> View Matthew Parker's profile on LinkedIn <>    

From: <> On Behalf Of Dominic Brooks Sent: Saturday, March 31, 2018 1:23 PM
Subject: Re: RTO Challenges  

Thanks all for replies.

Details appreciated.  

I will have little influence on changing the status quo. We already have RAC and Dataguard. Using another standby with delayed replication could be useful but it sounds like if we need ALL scenarios to be met by a 2 hr RTO then some scenarios will always include restore from backup and so the consequences are inevitable with the rates that we’re quoted (which I can’t influence).  

We were already looking at sharding for various reasons - pros and cons as with everything .... some RTO scenarios is just another box that looks tickable with sharding.  


Sent from my iPhone

On 31 Mar 2018, at 20:50, Mladen Gogala < <> > wrote:

Dominic, by far the most frequent trick for the fast recovery of even the largest databases is by using snapshot revert. Of course, snapshots are not backups and have significant downsides, but you can have a reasonable snapshot retention policy of 24 hours and have a normal streaming backup as the last line of defence. Of course, the first line of defence is RAC. The next line of defence is standby db. The next line of defence are snapshots. And the last line of defence is always backup. The speed of 2 TB per hour doesn't sound too good. With a dedicated 10 GB Ethernet, Commvault 11 can achieve between 3.5 TB/hr and 4T B/hr, depending on the degree of deduplication. If you use 2 bonded 10 GB wires, you can expect around 7 TB/hr. With the backup storage attached through 16 GB/sec FC adapter, the backup speed shoots up to 20 GB/hr. Even a 100 TB monster can be backed up in 5 hours and restored in about 7 hours.

It is expensive to run huge databases, in 100's of TB and some serious infrastructure is required: RAC, partitioning, diag/tuning pack, standby, adequate SAN and network infrastructure required to back it up. On SAN, you should also have additional licenses, like TimeFinder Clone, HUR or SnapVault.  

On 03/27/2018 05:51 AM, Dominic Brooks wrote:

I’m not a DBA as such and I’ve always skipped over most of the chapters on RMAN etc so very grateful for expert opinions on this please.  

  1. We have multi-TB DBs, as everyone does.
  2. The message from DBA policies is that we can only restore at 2 TB per hour.
  3. We have an RTO of 2 hours

As a result, there is a wide initiative pushed down onto application teams that there is therefore an implied 4TB limit to any of the critical applications’ databases, in the event that we run into those scenarios where we need to restore from backup.  

Initially, the infrastructure-provided solution was ZDLRA, for which our firm’s implementation thereof was initially promising a 4TB per hour restore rate but in practice is delivering the above 2TB per hour restore rate, and this is the figure used to the DBAs as a key input into this generic firm-wide policy.  

My thoughts are that this is still an infrastructure issue and there are probably plenty of alternative infrastructure solutions to this problem.

But now it is being presented as an application issue.

Of course applications should all have hygienic practice in place around archiving and purging, whilst also considering regulatory requirements around data retention, etc, etc.  

But it seems bizarre to me to have this effective database size limit in practice and I’m not aware of this approach above being common practice.

4TB is nothing by today’s standards.  

Am I wrong?

What different approaches / solutions could be used?  




Mladen Gogala
Database Consultant
Tel: (347) 321-1217

Received on Sun Apr 01 2018 - 06:57:44 CEST

Original text of this message