Re: efficient compare

From: Christopher Browne <cbbrowne_at_acm.org>
Date: Sun, 23 Apr 2006 08:12:22 -0400
Message-ID: <87fyk4h4nt.fsf_at_wolfe.cbbrowne.com>


In the last exciting episode, Bob Badour <bbadour_at_pei.sympatico.ca> wrote:
> Andersen wrote:
>
>> Bob Badour wrote:
>>
>>> I don't see how you are going to get log(N).
>> log(N) message exchanges! Not bit complexity (i.e. size of the
>> messages).
>
> You are not going to get log(N) in messages either.
>
>
>>> Consider A & B are equal each with 65535 tuples. Insert a tuple
>>> into A at the beginning of the sort order. Insert a tuple into B at
>>> the end of the sort order.
>>>
>>> None of your subdivided sets are going to give matching checksums
>>> except in error until you get down to individual tuples.
>> Here is a different solution to that problem. Keep N bins, where N
>> is some very large constant. Hash every tuple to an integer between
>> 1-N and put it in the corresponding bin. Now run the divide and
>> conquer on this.
>> I.e, first send checksum of the value of all bins
>
> But you said N is a large number which means you have to send a large
> number of checksums.

I wouldn't think so.

I seem to recall Sybase having a "lock promotion" feature where they would basically conclude "Hey, you've locked so much of the table that we'll promote those page locks into a table lock. Yeah, it's a worse lock, but it's way cheaper to manage than 28,742 page locks, and any other processes trying to access this table were screwed anyways..."

This looks like a case to use a similar technique.

In effect, "Hey, there are so few matches that we might as well do a bulk copy of the whole table."

-- 
wm(X,Y):-write(X),write('_at_'),write(Y). wm('cbbrowne','gmail.com').
http://linuxdatabases.info/info/lisp.html
"Sigh.  I like to  think it's just the Linux people who  want to be on
the `leading edge' so bad they walk right off the precipice."
-- Craig E. Groeschel
Received on Sun Apr 23 2006 - 14:12:22 CEST

Original text of this message