Re: efficient compare
From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sun, 23 Apr 2006 00:43:16 GMT
Message-ID: <E4A2g.64441$VV4.1212219_at_ursa-nb00s0.nbnet.nb.ca>
>
>
> That is the whole problem. I do not know how up to date A and B are, and
> I want to efficiently find that out so that I can send as little data
> over as possible. Logs might be huge, how do I know which part of the
> log to send over, I do not want to send all of it, unless it is needed!
> Your example with sending the size of the log over is in line with what
> I am looking for. But I want something much more efficient.
>
> Here is a naive scheme I was thinking about:
> Send a checksum the sorted set A to c(B), and vice versa.
> If the checksums match, you are done.
> If not, split the set into two parts, take the checksum of each of the 2
> parts and let the nodes exchange. For those that it does not match,
> repeat the divide and conquer scheme until you know which parts to
> exchange. This would mean you would be done in worst case log(N) rounds.
Date: Sun, 23 Apr 2006 00:43:16 GMT
Message-ID: <E4A2g.64441$VV4.1212219_at_ursa-nb00s0.nbnet.nb.ca>
Andersen wrote:
>> If B has a very outdated database, simply copy A over B. This assumes >> the data is sufficiently outdated that one can discard any changes to >> B that were never reflected in A. If B has not been around, simply >> copy A over B. Otherwise, have each send the other its logs from the >> last sync forward.
>
>
> That is the whole problem. I do not know how up to date A and B are, and
> I want to efficiently find that out so that I can send as little data
> over as possible. Logs might be huge, how do I know which part of the
> log to send over, I do not want to send all of it, unless it is needed!
> Your example with sending the size of the log over is in line with what
> I am looking for. But I want something much more efficient.
>
> Here is a naive scheme I was thinking about:
> Send a checksum the sorted set A to c(B), and vice versa.
> If the checksums match, you are done.
> If not, split the set into two parts, take the checksum of each of the 2
> parts and let the nodes exchange. For those that it does not match,
> repeat the divide and conquer scheme until you know which parts to
> exchange. This would mean you would be done in worst case log(N) rounds.
I don't see how you are going to get log(N).
Consider A & B are equal each with 65535 tuples. Insert a tuple into A
at the beginning of the sort order. Insert a tuple into B at the end of
the sort order.
None of your subdivided sets are going to give matching checksums except
in error until you get down to individual tuples.