RE: Zero DataLoss

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 10 Nov 2022 06:48:18 -0500
Message-ID: <1be501d8f4fa$540ba400$fc22ec00$_at_rsiz.com>



Tim wrote: No organization will enjoy being informed that their own preferences for performance push them away from their goal of zero data loss.  

They also seem to poorly receive the news that a goal of zero data loss may make some applications impossible to implement usefully.  

Good luck. Welcome to the razor’s edge of giving true advice while attempting to not get fired.  

Enjoy being told you’re about to step into quicksand? Tim’s probably correct. No. Whether or not the warning generates respect says a lot about management.  

I’ve personally had more than one contract not get renewed or extended for statements that began something like: “The implications of trying to do what you are planning are…<list of things incompatible with their current stated goals>.”  

A few of the best contracts I’ve had resulted from similar statements. I’m glad about the net.  

Possibly another approach may float.  

Batch systems are reasonably trivial to log transactions for replay, and that sort of serial output can reasonably be configured for minimal elapsed time change in your normally running system. You do have to get the order of transactions correct to prevent problems such as violating a constraint against selling an item when inventory has become zero when you should have replayed getting more inventory first.  

Planning is required to do that for interactive transactions and manual database changes, and anything ad hoc has to be fully logged even if it is “just a little fix.”  

If both are done well, that cannot stop you from data loss, but you may be able to create a point in time recovery that can be played forward sans the destructive event.  

That is of course entirely different from a generalized solution in database software (for which Tim explained well the conflicting goals), but it can be accomplished within defined goals for a specific system if the goals are not silly. But it is definitely not a free lunch to build a system that way. The insurance value of building a system that way has to justify the extra cost and time to implement, and building the functionless harness that facilitates building the functions you need flies in the face of the notion of Agile.  

Instead of just printing “Hello, World”, the task is more like wanting to print “Hello World” to everyone who wants to see it quickly, and being able to know precisely when and to whom you have printed “Hello, World.”  

Notice that the recovery and replay interfere with maximum availability by approximately the sum of 1) Time to notice something is wrong, 2) Time to decide whether fixing what is wrong is worth an outage, 3) Time to produce a running database at the point in time to begin the replay that has all the pipes in place to protect itself, 4) Doing the replay, and 5) Testing the resulting dataset versus the broken data set to your needed confidence level that what was broken is now fixed and that nothing that was supposed to happen failed to happen.  

No free lunch. The insurance value of losing nothing has to be high to build a system this way.  

Good luck,  

mwf  

From: Krishnaprasad Yadav [mailto:chrishna0007_at_gmail.com] Sent: Wednesday, November 09, 2022 11:01 PM To: tim.evdbt_at_gmail.com
Cc: Mark W. Farnham; Clay.Jackson_at_quest.com; Oracle L Subject: Re: Zero DataLoss  

Dear All,  

Thanks for the suggestion and detailed explanation .  

Unfortunately, customer is rigid to have fast sync in place , but in other db of same environment we had issue with it causing hude spike leading is timeout in transaction , so i was trying to understand for other way .  

Thanks everyone for time and explanation .  

Regards,

Krishna  

On Thu, 10 Nov 2022 at 00:51, Tim Gorman <tim.evdbt_at_gmail.com> wrote:

A good adviser understands enough to detect disconnects between intentions and behaviors. Seeking zero data loss while prioritizing performance is a disconnect. Choose a goal, stick with it.

A great adviser understands enough to recommend a range of solutions, describing tradeoffs between them, and there are ALWAYS tradeoffs. MAX_AVAILABILITY can minimize data loss almost to the point of zero data loss, but the chance is always there. With a little help from other features like FarSync, MAX_AVAILABILITY can also minimize performance impact.

A trusted adviser shares insight even when it might be unwelcome, especially because it might be unwelcome. No organization will enjoy being informed that their own preferences for performance push them away from their goal of zero data loss.

On 11/9/2022 10:10 AM, Mark W. Farnham wrote:

I absolutely love this instance of “can’t be unseen once you see it” and I heartily endorse Tim’s statement on this as well as Clay’s kind words and his point.  

mwf  

From: Tim Gorman [mailto:tim.evdbt_at_gmail.com] Sent: Wednesday, November 09, 2022 11:26 AM To: Clay.Jackson_at_quest.com; mwf_at_rsiz.com; chrishna0007_at_gmail.com; 'Oracle L' Subject: Re: Zero DataLoss  

Friends,

I'd like to point out something perhaps subtle but can't be unseen once you see it.

The DataGuard product group at Oracle has been nothing less than brilliant in creating and naming the three modes of DataGuard...

  1. MAX_PROTECTION
  2. MAX_AVAILABILITY
  3. MAX_PERFORMANCE
The simple fact is that it is not possible to prioritize data protection, service availability, and service performance simultaneously. Only one of these can be the top priority, and the other two must be subordinate. Period. End of sentence.

If you are going to prioritize data protection (a.k.a. true ZERO data loss), then you must do so regardless of the impact on availability or performance of the database service. If zero data loss is really the goal, then the other considerations must be subordinate.

That is why MAX_PROTECTION is rarely used in real-life. Very, very few organizations are willing to subordinate service availability. MAX_PROTECTION requires that if the standby is down, then the primary must be down too, which is absolutely what is required for maximum data protection. If a pending transaction cannot be protected, then it cannot be permitted to commit. This is not a limitation or a compromise, this is simply purity of vision.

It's almost like the old saying about three choices of good, fast, and cheap, except in this situation you only choose any one. There is always a trade-off.

The original poster's opening sentence about "zero data loss availability" shows fundamental confusion, because there is no such thing as data protection and availability with equal priority. Either "zero data loss" is the goal, or "availability" is the goal.

The original poster's organization has clearly chosen against "zero data loss" in prioritizing either availability or performance above. They should listen to their own decision, and realize that the organization is implicitly prioritizing performance over data protection and availability.

There is no judgement here, it is simply a fact. zero data loss is not possible unless it is the priority.

All that being said, MAX_AVAILABILITY does minimize the possibility of data loss substantially, and with the inclusion of a FarSync instance, can also greatly minimize impact on performance. MAX_AVAILABILITY is the mode that most closely approaches the ideal of meeting all three priorities, but still it requires dedication to service availability as the top priority, making data protection and performance subordinate. So by choosing MAX_AVAILABILITY, be clear that there must be a negative impact on data protection (i.e. RPO > 0) and on application performance, albeit minimized. Likewise, choosing MAX_PERFORMANCE must be accompanied by acceptance of a significant negative impact on data protection as well as service availability.

Choose one of the three modes, and understand all the implications of that choice. Also, understand what must be improved infrastructurally in order to adhere to the chosen mode. Nobody uses MAX_AVAILABILITY or MAX_PROTECTION on the "cheap".

Once you see it, you can't unsee it.

Hope this helps...

-Tim

On 11/9/2022 7:04 AM, Clay Jackson (Clay.Jackson) wrote:

As usual, MWF hit the relevant points, except perhaps not standing in the predicted meteor impact zone😊. Also consider these points:

For ANY zero data loss system it’s possible to come up with a scenario where (committed) data will be lost.

Attempting to achieve zero data loss can be an infinite resource (time and money) sink, so one should ALWAYS consider things like the cost and probability of that “last bit” of data actually being “lost”.  

And something to think about with MAA – all MAA or any “two-phase commit” system does is prevent a transaction from committing until there are multiple (presumably at least one of would be “secure”) copies of said transaction. In a failure scenario, what really happens is that the “last transaction”, which in fact MAY have been “commitable” in a non-MAA environment, doesn’t get committed, and, like any other “in-flight” transaction, is “lost”. All you’re really doing is changing the timing.  

Good luck!  

Clay Jackson        

From: oracle-l-bounce_at_freelists.org <mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org> On Behalf Of Mark W. Farnham Sent: Wednesday, November 9, 2022 5:27 AM To: chrishna0007_at_gmail.com; 'Oracle L' <mailto:oracle-l_at_freelists.org> <oracle-l_at_freelists.org> Subject: RE: Zero DataLoss  

CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.  

Presuming you’re already doing some sort of log application to a recovery system, a radio accessible way to pull the redo logs from a “dead” data center to be taken to your remote recovery site is probably the best you can do. Axxana Inc had this sort of hardware, but I’m getting a problem trying to visit their website to copy a link to you.  

IF a given “disaster” has a little warning you can update a custom table (say insert a row with the current scn and timestamp), commit, alter system switch logfile, alter system archive log all to accelerate shoving all the transactions committed so far to your recovery system. You can also have a policy that switches the database to restricted in the event of a disaster “early warning” but notice that in our mostly hacked world that is a slippery slope under time pressure for analysis. In combination those steps maximize the chances of shoving the required redo logs to your remote recovery systems in time. In lieu of the overhead of MAA or a piece of hardware that has a plex of your logfiles and archived logfiles and can transmit them in a burned up building that is buried in the crevasse of an earthquake and flooded in the crevasse, that’s about as close to zero as you are going to get. (I didn’t mention nuclear, because if you put a radio transmit unit in an EMP cage, you probably interfere with its ability to transmit. I leave that as an exercise for the community.)  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Krishnaprasad Yadav Sent: Tuesday, November 08, 2022 10:55 PM To: Oracle L
Subject: Zero DataLoss  

Dear Experts,  

We want to achieve zero data loss availability in our environment , for this we are planning to put MAA in DB , but we see there is overhead of redo causing lgwr events .

so we put it back to maximum performance .  

Apart from using ZDLRA and MAA, is their any other solution we can use to achieve this? .  

Regards,

Krishna            

--

http://www.freelists.org/webpage/oracle-l Received on Thu Nov 10 2022 - 12:48:18 CET

Original text of this message