Re: efficient compare

From: Andersen <andersen_800_at_hotmail.com>
Date: Sun, 23 Apr 2006 03:00:18 +0200
Message-ID: <444ad1b0$0$16012$892e7fe2_at_authen.yellow.readfreenews.net>


Bob Badour wrote:

>
> I don't see how you are going to get log(N).

log(N) message exchanges! Not bit complexity (i.e. size of the messages).

> Consider A & B are equal each with 65535 tuples. Insert a tuple into A
> at the beginning of the sort order. Insert a tuple into B at the end of
> the sort order.
>
> None of your subdivided sets are going to give matching checksums except
> in error until you get down to individual tuples.

Here is a different solution to that problem. Keep N bins, where N is some very large constant. Hash every tuple to an integer between 1-N and put it in the corresponding bin. Now run the divide and conquer on this.

I.e, first send checksum of the value of all bins (each bin might contain many tuples). Next level would divide the bins into two parts. Of course your smallest "atom" or level of granularity would be the number of bins, so if you have number of tuples far exceeding the number of bins, you'd be in trouble. But you should pick large enough number of bins... Received on Sun Apr 23 2006 - 03:00:18 CEST

Original text of this message