Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> Re: efficient compare

Re: efficient compare

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sun, 23 Apr 2006 21:26:51 GMT
Message-ID: <viS2g.64779$VV4.1227157@ursa-nb00s0.nbnet.nb.ca>


Andersen wrote:

> Bob Badour wrote:
>

>>> I.e, first send checksum of the value of all bins 
>>
>> But you said N is a large number which means you have to send a large 
>> number of checksums.

>
> Eh you misunderstood. Here it is more detailed.
> Have N bins. Hash every tuple to get an integer between 1..N. So every
> tuple now belongs to a bin.

[snipped example]

We are dealing with three numbers: C, N, and M. C is the cardinality of the relation and is linearly proportional to the size of the relation. N is the size of the hash table, which is arbitrarily chosen as a large number. M is the number of changes since the last synch and is linearly proportional to the size of the log file.

We established earlier that C >> M.

Assuming a uniform distribution and C > N, one can expect to find C/N tuples in each bucket. It makes no sense to choose N < M, because one will expect to find M/N > 1 updates in each bucket meaning you will send O(2^N) messages plus O(C) tuples. If we choose N > M and have a uniform distribution of changes in buckets, we can expect O(M) <= M buckets to have updates.

In the example snipped, you give M = 1 and log(N) = 160. I simply observe that 160 >> 1. For M << N, your example suggests you will need to send O(M * log(N)) messages plus O(M * C/N) tuples, which is much larger than the logfile O(M).

What did I misunderstand? Received on Sun Apr 23 2006 - 16:26:51 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US