| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: efficient compare
Andersen wrote:
> Bob Badour wrote:
>
>> As an example of the importance of physical structure, where do you >> intend to evaluate this result? At the computer containing A, c(A), at >> the computer containing B, c(B), or at some other computer c(C) ?
>> As another example, what is the maximum size of a single datagram >> passed over the network and what is the size of the representation of >> the tuples? I can think of an optimization that would improve >> performance if two or more tuples fit in a single datagram.
>> If you are trying to minimize traffic between the computers, then >> presumably the cost of any sorts or index creation on the computers >> matters less. But then again, one would still have to weigh the >> expected savings against the costs.
It won't matter in the replication case. Use the logs.
>> Do you envision this as part of a distributed database? Some sort of >> replication architecture? To perform a merge-purge of mailing lists? >> Simply to reconcile to similar but independent databases?
Then use the logs as I already suggested.
>> Since the size of the intersection will be relatively small and >> because the dbmses will have to reconcile updates temporally, it >> probably makes sense to just share log files from the previous >> checkpoint forward. Compressing the log files would be your primary >> efficiency opportunity.
If B has a very outdated database, simply copy A over B. This assumes the data is sufficiently outdated that one can discard any changes to B that were never reflected in A. If B has not been around, simply copy A over B. Otherwise, have each send the other its logs from the last sync forward.
If nothing has changed in either A or B, the logs from the last sync forward will be empty. In this case A=B, and the algorithm will be very efficient.
I suppose as an optimization, one can first send over the smaller log and then send back an edited log of just the changes required. Suppose c(A) sends a request to sync to c(B) and includes the size of the logs for A. c(B) then compares that figure with the size of the logs for B. If the logs of B are smaller, c(B) responds to the request with the logs of B. If the logs of B are larger, c(B) requests the logs from A.
At this point, either c(A) or c(B) has all of the information. Suppose c(B) has all of the information. It detects all of the changes required. It applies the changes required to B and sends a log of the changes required for A back to c(A). c(A) then applies those changes and the sync is complete.
If A and B had identical changes since the last synch, c(B) will detect that no changes are required and send c(A) an empty log of changes.
>> Pardon me for observing that it sounds like a question or essay topic >> for a course of some sort. People working for a dbms vendor >> implementing some sort of distributed database or replication feature >> would tend to keep abreast of the state of the art using much better >> sources than usenet.
None of what I wrote above required years as an implementer of dbmses. I simply reasoned it out on my own. The only item of consideration missing from your original question was the almost certain existence of the files that log all changes to the dbms.
I suggest where the state of the art lies is in the merging of two log files to determine the necessary changes and the potential to abbreviate both log files before the process begins. If the same value changes twice in a dbms, one might be able to get away with tossing out the first change and keeping only the last change. Triggered procedures, of course, will complicate this. Received on Sat Apr 22 2006 - 08:54:23 CDT
![]() |
![]() |