Re: Bi-Directional replication

From: David Mann <dmann99_at_gmail.com>
Date: Tue, 27 Sep 2016 16:36:02 -0400
Message-ID: <CAGazuyX_NDW0LUv0ENw9ryk-1DAH7L8XyqE465Z4B16LFrGs2g_at_mail.gmail.com>



On Tue, Sep 27, 2016 at 1:05 AM, FreeLists Mailing List Manager < ecartis_at_freelists.org> wrote:

>
> Subject: Re: Bi-Directional replication
> From: Ryan January <rjanuary_at_gmail.com>
> Date: Mon, 26 Sep 2016 19:37:02 -0500
>
> We're not using DBVisit for bi-directional replication, however we are
> using it with goldengate. I asked similar questions 2 years ago, and
> received no feedback. If there's one word of advice I could give you out of
> the gate, strongly consider the reasons you're looking to go full
> bi-directional and avoid it if at all possible. There are numerous very
> large issues you are going to need to dedicate a serious quantity of time
> resolving in your environment. In fact, I recently watched a talk from a
> few google employees that stated google sees this as one of the hardest
> problems to resolve in data science. If you do not have the application
> owners 100% buy in to resolve issues the project is not going to succeed.
> Here are a few things that we have run across.
> 1.) Sequences: If any sequences are used to drive a table's PK, you're
> going to have to consider alternate approaches. We're doing a mod(n)
> approach, where n is the number of replication points of origin you have
> 2.) Data consistency issues: Remember that you're replicating data
> between two databases, not replicating a database. As such this poses a
> new set of problems. What happens when the data already exists in this
> environment (prior insert) and the DB sees an insert operation. If it's
> the PK and/or unique constraint it will fail. You will want to do regular
> comparisons with the data (as you would with a dataguard logical standby)
> It's not a question of if you'll run into inconsistencies. The real
> question is when and how you'll resolve them. We're using Oracle Veridata,
> however there are a number of products which performs a similar function.
> Get the buy in from your app/business owner out of the gate that they will
> help resolve data consistency issues. When you have conflicting records
> there is no way you'll understand which record is 'correct' without the
> business' help.
> 3.) "in flight" transactions: Any replication methodology on earth will
> introduce some sort of delay. Simple physics limits the speed of
> transmission from the US east coast to US west coast at about 70ms. What
> happens when people try to insert (or update) data in multiple DC's at the
> same time. What record do you want to keep in that scenario?
> 4.) app and schema upgrades now require 10 times more effort: If you're
> running live out of both datacenters you have to insure that your
> application is very decoupled from the DB schema. Think of this scenario.
> Your App is using all columns of a given table. It is running live in both
> data centers and knows the table structure underneath it. You upgrade the
> application in datacenter A, and add a column to a table. What does the
> application in datacenter B think when it sees the new column? Conversely,
> what happens when the new column has a not null constraint and sees data
> replicating in from datacenter B with that column having a null value?
>
> These were all issues we saw going into the implementation. I was
> slightly nervous about how we'd fare these situations. Looking back I
> should have been twice as nervous. Goldengate tends to introduce about 2-3
> seconds of delay in our data. Under normal operating conditions that's
> fine, until you have a service that is normally running in datacenter A,
> and it 'fails over' to datacenter B. Client requests never stop, and
> they're now asking for data they expect to exist, and the database doesn't
> know of them yet (because of replication delay). You and the business will
> need to determine how these situations should be handled.
>
> There is no way you'll be able to solve these issues yourself, which is
> why you need the assistance, and backing of the business. From a technical
> perspective each of these issues are all easy to deal with. The problem is
> that behind every one of these technical issues is a people / political
> problem waiting to happen. A great item to research is the CAP theorem
> (Consistency, Availability, Persistence). It will give you a fair
> understanding of the issues you'll be facing and ways an application can be
> engineered to get around some of them.
>
> If you have any specific questions about how we've approached some of
> these issues I'd be happy to answer any additional questions you have. I
> wish someone would have been in my position when we first considered going
> down this road.
>
> Thank you,
> Ryan
>
> > On Sep 26, 2016, at 5:04 PM, Vadim Keylis <vkeylis2009_at_gmail.com> wrote:
> >
> > Good afternoon everyone. I am working on designing POC bi-directional
> replication between two data centers using dbvisit application. What are
> the general best practices when designing database for bidirectional
> replication ? What are the general design consideration should be made from
> your experience for bi-directional replication?
> >
> > Thanks so much in advance,
> > Vadim
>
>
Ryan,

I hate that I missed your message from two years ago. You definitely summed up the issues with bidirectional replication. I find it funny how many projects I work on today where they assume that a 'swich' can be flipped to make any application replicate bidirectionally and work in an Active-Active fashion. I usually open my big mouth and start rattling off a lot of the issues you listed... Then I call for an increase of project scope with associated funding and that's the end of it :)

In the late 90's I worked with Sybase SQL Anywhere (previously Watcom SQL) which could replicate bidirectionally. We set up a hub-and-spoke model replication to consolidate Point of Sale data at a HQ location. Luckily we were just running reports at he HQ and not entering bulk amounts of transactions but I can see the Active-Active nature of some designs can put a lot of strain on infrastructure. I have worked with GoldenGate a bit but so far haven't done any bi-directional replication except for messing around.

Key management is one of the fun ones. Since we had variable numbers of locations we used a compound PK which had the location number + a standard numeric PK. This made it so each location could generate their own keys without worry of collisions. We also investigated doling out keys via key pools but the management of that was quite involved and introduced a dependency on whatever node was going to hand out the blocks of keys.

These replication systems (Sybase and GoldenGate) offer conflict resolution capabilities but handling all of the cases where conflicts can happen... and then getting everyone to decide what to do about the conflict is a whole different story.

-Dave

-- 
Dave Mann
General Geekery | www.brainio.us
Database Geekery | www.ba6.us | _at_ba6dotus | http://www.ba6.us/rss.xml

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 27 2016 - 22:36:02 CEST

Original text of this message