Re: RTO Challenges

From: Dominic Brooks <>
Date: Sat, 31 Mar 2018 20:23:16 +0000
Message-ID: <DB6P190MB05017954DCBEAADDB85E0E54A1A00_at_DB6P190MB0501.EURP190.PROD.OUTLOOK.COM>

Thanks all for replies.
Details appreciated.

I will have little influence on changing the status quo. We already have RAC and Dataguard. Using another standby with delayed replication could be useful but it sounds like if we need ALL scenarios to be met by a 2 hr RTO then some scenarios will always include restore from backup and so the consequences are inevitable with the rates that we’re quoted (which I can’t influence).

We were already looking at sharding for various reasons - pros and cons as with everything .... some RTO scenarios is just another box that looks tickable with sharding.


Sent from my iPhone

On 31 Mar 2018, at 20:50, Mladen Gogala <<>> wrote:

Dominic, by far the most frequent trick for the fast recovery of even the largest databases is by using snapshot revert. Of course, snapshots are not backups and have significant downsides, but you can have a reasonable snapshot retention policy of 24 hours and have a normal streaming backup as the last line of defence. Of course, the first line of defence is RAC. The next line of defence is standby db. The next line of defence are snapshots. And the last line of defence is always backup. The speed of 2 TB per hour doesn't sound too good. With a dedicated 10 GB Ethernet, Commvault 11 can achieve between 3.5 TB/hr and 4T B/hr, depending on the degree of deduplication. If you use 2 bonded 10 GB wires, you can expect around 7 TB/hr. With the backup storage attached through 16 GB/sec FC adapter, the backup speed shoots up to 20 GB/hr. Even a 100 TB monster can be backed up in 5 hours and restored in about 7 hours.

It is expensive to run huge databases, in 100's of TB and some serious infrastructure is required: RAC, partitioning, diag/tuning pack, standby, adequate SAN and network infrastructure required to back it up. On SAN, you should also have additional licenses, like TimeFinder Clone, HUR or SnapVault.

On 03/27/2018 05:51 AM, Dominic Brooks wrote: I’m not a DBA as such and I’ve always skipped over most of the chapters on RMAN etc so very grateful for expert opinions on this please.

  1. We have multi-TB DBs, as everyone does.
  2. The message from DBA policies is that we can only restore at 2 TB per hour.
  3. We have an RTO of 2 hours

As a result, there is a wide initiative pushed down onto application teams that there is therefore an implied 4TB limit to any of the critical applications’ databases, in the event that we run into those scenarios where we need to restore from backup.

Initially, the infrastructure-provided solution was ZDLRA, for which our firm’s implementation thereof was initially promising a 4TB per hour restore rate but in practice is delivering the above 2TB per hour restore rate, and this is the figure used to the DBAs as a key input into this generic firm-wide policy.

My thoughts are that this is still an infrastructure issue and there are probably plenty of alternative infrastructure solutions to this problem. But now it is being presented as an application issue. Of course applications should all have hygienic practice in place around archiving and purging, whilst also considering regulatory requirements around data retention, etc, etc.

But it seems bizarre to me to have this effective database size limit in practice and I’m not aware of this approach above being common practice. 4TB is nothing by today’s standards.

Am I wrong?
What different approaches / solutions could be used?




Mladen Gogala
Database Consultant
Tel: (347) 321-1217

-- Received on Sat Mar 31 2018 - 22:23:16 CEST

Original text of this message