Re: Bi-Directional replication

From: Mark Van de Wiel <mark.van.de.wiel_at_gmail.com>
Date: Mon, 26 Sep 2016 22:29:15 -0700
Message-ID: <fb8562b5-ac97-c9dd-e81e-da1b2a42c8fe_at_gmail.com>


A few other considerations include:

- Triggers (if you have any): you probably don't want these to fire when you replicate data (so perhaps update all triggers to not fire when your replication user makes changes to the database).
- Cascade deletes can generally cause challenges with replication, but even more so in active/active scenarios.
- If the application truncates data you may run into challenges.

Also regarding Ryan's point 2 below: can you detect potential conflicts (e.g. same row is updated on 2 databases within the latency window between the two databases) and possibly resolve them as part of the replication?

Mark.

On 9/26/2016 5:37 PM, Ryan January wrote:
We're not using DBVisit for bi-directional replication, however we are using it with goldengate.  I asked similar questions 2 years ago, and received no feedback. If there's one word of advice I could give you out of the gate, strongly consider the reasons you're looking to go full bi-directional and avoid it if at all possible. There are numerous very large issues you are going to need to dedicate a serious quantity of time resolving in your environment.  In fact, I recently watched a talk from a few google employees that stated google sees this as one of the hardest problems to resolve in data science. If you do not have the application owners 100% buy in to resolve issues the project is not going to succeed.

Here are a few things that we have run across.
1.) Sequences:  If any sequences are used to drive a table's PK, you're going to have to consider alternate approaches.  We're doing a mod(n) approach, where n is the number of replication points of origin you have
2.) Data consistency issues:  Remember that you're replicating data between two databases, not replicating a database.  As such this poses a new set of problems.  What happens when the data already exists in this environment (prior insert) and the DB sees an insert operation.  If it's the PK and/or unique constraint it will fail.  You will want to do regular comparisons with the data (as you would with a dataguard logical standby)  It's not a question of if you'll run into inconsistencies.  The real question is when and how you'll resolve them. We're using Oracle Veridata, however there are a number of products which performs a similar function.  Get the buy in from your app/business owner out of the gate that they will help resolve data consistency issues.  When you have conflicting records there is no way you'll understand which record is 'correct' without the business' help.
3.) "in flight" transactions:  Any replication methodology on earth will introduce some sort of delay.  Simple physics limits the speed of transmission from the US east coast to US west coast at about 70ms.  What happens when people try to insert (or update) data in multiple DC's at the same time.  What record do you want to keep in that scenario?
4.) app and schema upgrades now require 10 times more effort:  If you're running live out of both datacenters you have to insure that your application is very decoupled from the DB schema. Think of this scenario.  Your App is using all columns of a given table.  It is running live in both data centers and knows the table structure underneath it.  You upgrade the application in datacenter A, and add a column to a table.  What does the application in datacenter B think when it sees the new column?  Conversely, what happens when the new column has a not null constraint and sees data replicating in from datacenter B with that column having a null value?

These were all issues we saw going into the implementation.  I was slightly nervous about how we'd fare these situations. Looking back I should have been twice as nervous.  Goldengate tends to introduce about 2-3 seconds of delay in our data.  Under normal operating conditions that's fine, until you have a service that is normally running in datacenter A, and it 'fails over' to datacenter B.  Client requests never stop, and they're now asking for data they expect to exist, and the database doesn't know of them yet (because of replication delay).  You and the business will need to determine how these situations should be handled.

There is no way you'll be able to solve these issues yourself, which is why you need the assistance, and backing of the business.  From a technical perspective each of these issues are all easy to deal with.  The problem is that behind every one of these technical issues is a people / political problem waiting to happen.  A great item to research is the CAP theorem (Consistency, Availability, Persistence). It will give you a fair understanding of the issues you'll be facing and ways an application can be engineered to get around some of them.

If you have any specific questions about how we've approached some of these issues I'd be happy to answer any additional questions you have.  I wish someone would have been in my position when we first considered going down this road.

Thank you,
Ryan

On Sep 26, 2016, at 5:04 PM, Vadim Keylis <vkeylis2009_at_gmail.com> wrote:

Good afternoon everyone. I am working on designing POC  bi-directional replication between two data centers using dbvisit application. What are the general best practices when designing database for bidirectional replication ? What are the general design consideration should be made from your experience for bi-directional replication?

Thanks so much in advance,
Vadim


-- http://www.freelists.org/webpage/oracle-l Received on Tue Sep 27 2016 - 07:29:15 CEST

Original text of this message