Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 19 Dec 2006 23:17:41 -0800
Message-ID: <1166599061.794681.166020_at_t46g2000cwa.googlegroups.com>


Marshall wrote:
> On Dec 19, 5:36 pm, "David" <davi..._at_iinet.net.au> wrote:
> > Fine-grained transactions are good until you use distributed
> > transactions. That doesn't worry me because IMO distributed
> > transactions are a poor approach that can and should be avoided most of
> > the time. This is not the conventional viewpoint. That may change
> > one day if Operational Transform proves to be useful for replicated
> > databases.
>
> So, howsabout you give us an example? So far all we've heard
> from you has been a description of properties of your proposed
> solution. Please show me an example of doing something
> modestly nifty with OT that might not hold up so well in
> a distributed transaction setting.
>
> Assume I'm already aware that network cards, switches,
> and whole machines periodically misbehave. (Oh, how
> I wish I was *less* aware of that.)

Real time, interactive editing of replicated data is very well suited to OT. Local edits are applied immediately irrespective of network latency. In fact the network can go down and users are able to continue local editing.

Operations are sent asynchronously, transformed as required and applied at each site. Users are never blocked, and high transaction throughput (measured in many thousands of transactions per second) is achieved by *avoiding* the need for concurrent write transactions (by making transactions very brief) rather than embracing writer concurrency (and allowing transactions to take a long time).

I have built a working system and the results are very impressive. In our office we've collaborated on a large virtual world of components including text documents, jigsaws, white boards etc. The system scales linearly with the number of users and handles long disconnection periods. The performance is so good that you can see objects moving around smoothly as they're being dragged using the mouse by other users. Seeing many jigsaw pieces being moved simultaneously and snapping into position is interesting to see.

The mathematics of OT is rather subtle, and the results can be surprising. It is particularly curious to work interactively on a jigsaw then the network drops out for a while so you lose the interactivity. When the network comes back again the jigsaw pieces suddenly move to a result representing the combined efforts of the users. OT tries its best to preserve each user's intent, and only fails to do so when there is truly a conflict.

The system doesn't ask users how to resolve conflicts. Instead it makes an arbitrary decision itself! The important thing is that all sites agree on how the conflict was resolved so that at quiescence all sites converge to the same site. This is achieved despite the fact that the system is a true peer-peer, and supports arbitrary changes to the connection topology. The justification for resolving all conflicts automatically is that user edits are so fine grained that a user doesn't particularly care if an individual edit was annulled because of a conflict with another user. In the case of interactive collaboration the users can see each other's edits as they happen and silent resolution of conflicts works very well in practice. Having dialogs pop up reporting conflicts would become an extreme nuisance.

The idea of fine-grained transactions relates directly to the user experience. Instead of presenting dialogs to the user, who must make changes then "commit" by clicking on an OK button on the dialog, the user instead *directly* edits the data without dialogs. This allows every single keystroke or mouse movement to be a separate operation (ie transaction). Compression through merging sets of old operations becomes important for performance.

In the case of non-interactive collaboration (made possible with a repository and explicit support for updates, checkins, branching and merging), again the system will happily merge edits and make its own decisions about how to resolve conflicts. Unlike CVS, edits on the same line of text can always be merged unambiguously. Nevertheless it is possible that the merge has broken higher level semantic constraints that can't be validated by the system. For that reason, after a merge there is always the responsibility of validation before committing.

The ability to tag a repository provides an alternative to long running, complex transactions. Tagging only involves storing a vector time so the cost is essentially zero. Branching and merging allow long running "tasks" (that may take weeks) to proceed without fighting over resources.

OT has no trouble eliminating the exposure to network I/O. Avoiding writer thread exposure to disk I/O depends mostly on using a good read/write cache. The works particularly well for edits on a virtual world because a user only edits objects that have been displayed on the screen which tends to mean they are already memory resident. Disk reads on incoming operations can be avoided by a pre-fetching strategy.  This is possible because the operation has already been fully recorded so it is known up front what objects need to be fetched from disk.

A B+Tree requires a more sophisticated approach because it can't be assumed that keys to be inserted/deleted are clustered in time - most generally they will be uniformly scattered across the B+Tree leaf nodes. I'm currently working on an implementation that employs delta sets for the addition and removal of (key,value) entries, together with a prefetching strategy to avoid disk reads by writer threads.

Note that I believe that sometimes the more conventional pessimistic, long lived locking approach is better. It depends on the application.

Cheers,
David Received on Wed Dec 20 2006 - 08:17:41 CET

Original text of this message