Re: efficient compare

From: David Cressey <dcressey_at_verizon.net>
Date: Mon, 24 Apr 2006 22:21:37 GMT
Message-ID: <Rbc3g.2530$Qe.1509_at_trndny09>


"Andersen" <andersen_800_at_hotmail.com> wrote in message news:444CF0EB.7000604_at_hotmail.com...
> Bob Badour wrote:
>
> > But if you consider the example I gave previously that involved a single
> > insert into each table, swapping logs would be trivial.
> >
> > Likewise, by comparing the size of the log to the size of the relation
> > variable, one could choose to send the relvar instead of the logs.
> > However, one would then have trouble reconciling multiple changes to
> > similar tuples.
>
> I want a generic algorithm. You are right that swapping logs is
> excellent for your example, but what if one node does not have half of
> what the other node has, then logs might be a bad idea. I want to
> periodically run this algorithm to synchronize the nodes. Something
> similar to what I described in the other posting (the algorithm I gave),
> but maybe something more efficient.

OK, let's take it from the top.

You want to sunchronize sets A and B, after each of them has been subjected to inserts and deletes. For some reason, the idea of keeping and swapping logs of the changes is impractical. I don't get this, but no matter. and you want a basic algorithm, not just a tool.

OK, let's start from there. It seems to me that one basic algorithm would be the merge algorithm. If you stored A and B in such a way that you could easily retrieve both of them in some order, you could apply a simple merge process to determine
which elements are missing in either A or B. To simplify (?) let's say that A and B are tables with at least one candidate key, and that a candidate key has been arbitrarily chosen as the primary key. Further, let's say that there is an index on the primary key in both tables, and we're paying the freight to keep the two indexes up to data for some other reasons.

Now we can retrieve A and B in the same order relatively quickly. If we do a merge algorithm, and zig zag back and forth between advancing A and advancing B, we should detect A minus B and B minus A in the process. Now the question is how to do this process by doing part of the process on each computer, and swapping a minimal amount of data for all the rows that match. At the very least, you could exchange only primary keys, instead of entire records, and each process could detect keys not present in their own copy of the set. They could then exchange entire rows, only when needed.

This should reduce traffic. I suspect that there are ways of further compressing the data to be exchanged on matching rows. I admit that this algorithm doesn't cover updates made to non key data in A and B. Is this an important consideration in the problem you are addressing? Received on Tue Apr 25 2006 - 00:21:37 CEST

Original text of this message