Re: RTO Challenges

From: Niall Litchfield <niall.litchfield_at_gmail.com>
Date: Tue, 27 Mar 2018 18:06:55 +0100
Message-ID: <CABe10sYmawAZ8_op=kwgRkqxPw9AAQL5OUH=sD3b6gbNYwrkzQ_at_mail.gmail.com>



Hi

Our approach and I would suggest the only sane approach, is to ensure that RTO (and RPO) are *business* decisions - there might well be levels of service for different applications of course. The application owner gets to choose essentially (and to fund the necessary infra). The IT function absolutely has reference architectures and guidelines to help - but it is only the application owners that can define what a 2 hour (or 5 minute) outage is worth. The other thing we strongly encourage is for application owners to think about what their failure scenarios are, common examples might be

  • server loss
  • data centre/AZ loss
  • region loss.
  • logical failure (update asktom.questions set asked_date = sysdate; commit; )

The bottom line though is that availability, like performance, is an application feature, not an infrastructure constraint. There might, of course, be cost constraints. In your particular case, the implied bandwidth is 4.4 Gbit/sec and probably more pertinently 530 MB/s throughput. *guaranteeing* significant improvement on those figures may well be a large cost. It might be time to ask what the cost of, say, doubling the capacity or RTO are respectively.

On Tue, Mar 27, 2018 at 10:51 AM, Dominic Brooks <dombrooks_at_hotmail.com> wrote:

> I’m not a DBA as such and I’ve always skipped over most of the chapters on
> RMAN etc so very grateful for expert opinions on this please.
>
>
>
> 1. We have multi-TB DBs, as everyone does.
> 2. The message from DBA policies is that we can only restore at 2 TB
> per hour.
> 3. We have an RTO of 2 hours
>
>
>
> As a result, there is a wide initiative pushed down onto application teams
> that there is therefore an implied 4TB limit to any of the critical
> applications’ databases, in the event that we run into those scenarios
> where we need to restore from backup.
>
>
>
> Initially, the infrastructure-provided solution was ZDLRA, for which our
> firm’s implementation thereof was initially promising a 4TB per hour
> restore rate but in practice is delivering the above 2TB per hour restore
> rate, and this is the figure used to the DBAs as a key input into this
> generic firm-wide policy.
>
>
>
> My thoughts are that this is still an infrastructure issue and there are
> probably plenty of alternative infrastructure solutions to this problem.
>
> But now it is being presented as an application issue.
>
> Of course applications should all have hygienic practice in place around
> archiving and purging, whilst also considering regulatory requirements
> around data retention, etc, etc.
>
>
>
> But it seems bizarre to me to have this effective database size limit in
> practice and I’m not aware of this approach above being common practice.
>
> 4TB is nothing by today’s standards.
>
>
>
> Am I wrong?
>
> What different approaches / solutions could be used?
>
>
>
> Thanks
>
>
> Regards,
>
> Dominic
>
>
>
>
>
>
>

-- 
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Mar 27 2018 - 19:06:55 CEST

Original text of this message