RE: Minimize recovery time

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 27 Apr 2022 16:27:27 -0400
Message-ID: <06b301d85a75$36a13e40$a3e3bac0$_at_rsiz.com>

That depends. By the way, read what Tim Gorman wrote three or four times and let it percolate in. (Now I have been able to recover, in ancient times, all the various bits and pieces [datafiles, individual tablespaces, sets of tablespaces]), but I cheated because my friend who wrote all that was specifically walking through the cases with me pre-beta release so if I was about to do something different than he intended we fixed the documentation, if I didn’t something required to proceed (like an appropriate vintage of the control file) or one more log than you might think you needed, he was sure about that with no time lost, and if (finally) something was wrong with the software, he was prepared to fix it (that didn’t actually happen). Heterochronus recovery does in fact work, but lots of folks get very confused about what it means to have parts of your database through different time machines. (For example, if you do recover two tablespaces containing information that is part of a transaction system, you very likely just cyber attacked yourself with respect to transactions.)

Flashback of some duration tends to consume resources so you would need to test its effect on your system.

A lagged application of logs can be made pretty doggone secure at what I would guess is much cheaper than the other options. That depends on your environment.

Read what Tim wrote one more time.

From: Lok P [mailto:loknath.73_at_gmail.com] Sent: Wednesday, April 27, 2022 3:58 PM
To: Karthikeyan Panchanathan; Mark W. Farnham; tim.evdbt_at_gmail.com Cc: Andy Sayer; Oracle L
Subject: Re: Minimize recovery time

Actually the top consumer are two partitioned tables (combinely ~3TB in size) followed by index partition and few non partitioned indexes. Those two tables have all the data needed for business and they are not historical ones, so we can't purge/move them to another system. There are 2-3 other tables summing ~2TB in size and we can move those out. So it seems like we may be able to achieve the RTO with the current system if we can compress the data effectively for a couple of these tables. But yes , I wanted to see what compression is suitable for an OLTP kind of data/system with minimal performance overhead?

"Lag application of logs on one of them long enough that you will be able to detect a cyber attack."

Also thinking about the above point, if it's a good idea to turn on the flashback option, to cater this cyber attack stuff, if we turn that on with retention period of 3days/72 hours or more?

On Wed, Apr 27, 2022 at 10:01 PM Karthikeyan Panchanathan <keyantech_at_gmail.com> wrote:

Handled similar scenario how quickly we can recover when DB 10TB. RTO was less than 3hours.

In our case we had many old(history) for compliance data with longer retention policy. According to me this data is using DB as storage.

Worked with Compliance and Business to push History data into 1 schema/1 tablespace then export as data dump. Archive in Tape with same data retention policy.

Once exported that data purged to reduce DB size. We we able to bring RTO under 3 hours.

It worked in our scenario. Sharing here if that any helpful.

Karth

Get Outlook for iOS <https://aka.ms/o0ukef>

From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of Lok P <loknath.73_at_gmail.com> Sent: Wednesday, April 27, 2022 11:53:34 AM To: Mark W. Farnham <mwf_at_rsiz.com>; Andy Sayer <andysayer_at_gmail.com> Cc: Oracle L <oracle-l_at_freelists.org> Subject: Re: Minimize recovery time

Just checked, its really not for guarding a multi location disaster but rather , in case of any cyber attack if the data is corrupted in primary the same will be propagated to secondary/dataguard site. So in that case we will need to rely on the backup/recovery process RTO.

Also with regards to table/index compression. We are seeing in another database the table compression with 'compress for query high' option is decreasing the size of the data to 1/3rd of the original size. So is it safe to go for this compression as an initial approach and test this OLTP application against it? But for index it appear to be only Key compression , so we need to carefully see the non unique index if any. and what storage space benefit are we getting out of it. Correct me if wrong.

On Wed, 27 Apr 2022, 7:08 pm Mark W. Farnham, <mwf_at_rsiz.com> wrote:

If the business requirement is truly for multiple site disasters still providing business continuation you have a difficult task.

First, you should try to gain an understanding amongst the stakeholders that IF you are guarding against multiple data center disasters (otherwise a dataguard or a remote standby catch-up and fail-over seems sufficient), that implies you have a third repository of the data far away from the first two, most likely with an agreement with a third party to spin up hardware to recover on at their site.

Very likely they will then understand that your current set up for failover is sufficient for the requirement.

IF I am wrong about that, then the most likely solution is to introduce time based partitioning of all the data that in fact has a date after which it is not allowed to be changed AND is not required for the operations that must be available for business continuation. (Rarely are old transaction histories required with the same immediacy as current inventory quantities, and so forth).

IF sufficient data meeting those characteristics can be identified that will permanently keep you within your physical reload recovery window, then you also need to be in a position to shuffle (probably shrinking down free space and permanently doing useful attribute clustering) partitions to “slower recovery okay” tablespaces.

Then you can practice the plan to bring up immediately only those tablespaces required for operations that need the stated business continuation immediacy (continuing the reload and recovery of the other tablespaces after business of the critical functions resumes.)

Avoiding this entire race dance is the point of online recovery mechanisms: Modern systems, often on SSD, quite often grow far too large to “back up” onto more persistent storage in terms of the ability to read that persistent storage back onto storage connected to your machine.

Another possible way to do this is to plug multiple SANs into your machine(s). This, of course, does not handle the multiple site disaster problem. You don’t keep any “current” data on the alternated SANs (of which you have a minimum of two), because you never start overwriting your only complete backup file set. After a backup is complete, the relevant SAN is physically disconnected.

Then, in a “storage disaster”, after you clean up the host from the software hack that was likely the cause, you connect your most recent backup SAN and away you go.

Not all machines have connections for plugging in multiple SANs, and of course you can’t make these SANs “virtual” storage. You’re unplugging one to make it air gapped from attack. You might have an air gapped machine to plug it into to run full surface scans and memory checks (SSD), but that entire set-up is non-networked.

When they balk at the cost, perhaps it is time to engage a certified actuary to explain to them what rare case they are insuring against (and probably don’t have all the things they need to in order to make the plan possibly succeed.)

And, of course, any plan you have that you don’t test regularly is just wishful thinking. Testing plug-in replacement storage is probably a bigger risk than relying on something like dataguard or storage snapshots.

If they are worried about this, do they have multiple physically independent communications infrastructure? How about power generators?

Good luck,

mwf

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Lok P Sent: Wednesday, April 27, 2022 7:21 AM
To: Andy Sayer
Cc: Oracle L
Subject: Re: Minimize recovery time

Yes they are on different datacenter and those too in different locations.

And backup is being taken in both primary and secondary through ZDLRA and i believe the respective backups must be kept in the respective data centre in their configured ZDLRA storage.

On Wed, 27 Apr 2022, 4:06 pm Andy Sayer, <andysayer_at_gmail.com> wrote:

Your dataguard is using the same storage as your primary? Usually it would be a whole different data centre. Where are your backups going?

On Wed, 27 Apr 2022 at 11:35, Lok P <loknath.73_at_gmail.com> wrote:

Yes we have dataguard setup , but this agreement is in place in case of both primary and dataguard DB fails because of disaster or corruption etc.

On Wed, 27 Apr 2022, 3:30 pm Andy Sayer, <andysayer_at_gmail.com> wrote:

Have you considered Dataguard? You’d have a secondary database always ready to failover to.

Thanks,

Andy

On Wed, 27 Apr 2022 at 10:50, Lok P <loknath.73_at_gmail.com> wrote:

Hello Listers, We have an Oracle Exadata (X7) database with 12.1.0.2.0 and it's now grown up to size 12TB now. As per client agreement and criticality of this application the RTO(Recovery time objective) has to be within ~4hrs. The team looking after the backup recovery has communicated the RTO(recovery time objective) as ~1hrs for ~2TB of data with current infrastructure. So going by that, this current size of the database will have RTO ~6hrs which is more than the client agreement(which is ~4hrs).

Going through the top space consumers, we see those are table/index sub-partitions and non partitioned indexes. Should we look into table/index compression here? But then i think there is also downside of that too on the DML performance.

Wanted to understand Is there any other option to get this achieved (apart from exploring possible data purge) to have this RTO faster or under the service agreement? How should we approach.

Regards

Lok

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Apr 27 2022 - 22:27:27 CEST

This message: [ Message body ]
Next message: Mladen Gogala: "Re: Minimize recovery time"
Previous message: Lok P: "Re: Minimize recovery time"
In reply to: Lok P: "Re: Minimize recovery time"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message