Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 15 Dec 2006 21:33:45 -0800
Message-ID: <1166247225.602611.204240_at_t46g2000cwa.googlegroups.com>


Sampo Syreeni wrote:
> On 2006-12-15, David wrote:
>
> > Note that OT is not a locking protocol. In allows for multi-users with
> > no locking at all.
>
> So what do you mean when you say "locking"? Delay? That isn't
> necessarily present. Exclusion? Neither. Any precaution at all? Well,
> sure. But then timestamping is a sort of precaution as is particular
> structuring of the permissible write transactions, and vice versa OT
> takes action after the fact, which the so called locking protocols
> don't. All of the optimistic, non-locking concurrency protocols
> eventually do so as well. After that I've even pointed out that so do
> the protocols relying on open transactions, semantic locking and higher
> level compensation. Only they sometimes don't.
>
> I'd say the partition into locking and non-locking protocols is hazy at
> best.

OT is very pure in that it *always* allows an operation to be generated and executed immediately on a local database with complete disregard for other sites. The network can go down for extended periods. It is assumed that the operations can be applied *asychronously* at other sites (after transformation) with "intention preservation". There is no concept of distributed transactions or even locking a remote resource. Atomicity is only required independently on each site DB. Vector times ensure that all operations are applied exactly once at each site. It is even possible for a site to crash and recover back to an earlier state (subject to atomicity of the locally applied transactions). In other words, durability is nowhere near as important as in systems employing distributed transactions.

> > OT imposes strong restrictions on the integrity constraints. It is not
> > permissible to nul an operation once it has been generated.
>
> What do you mean by this?

For example, an insertion into a text document can only have the insertion position adjusted by OT. The insertion itself is not disabled under inclusion transform. Therefore OT can't allow an integrity constraint on the maximum size of the text document.

> > This limits the applications for which OT is suitable. Eg don't use OT
> > for reserving seats on an aeroplane.
>
> In a really small aeroplane company it is possible that seats are
> reserved by editing a shared text file. It is claimed that OT is
> suitable for shared editing of text files. Hence, it must be suitable
> for seat reservation under at least some conditions. Can you elaborate
> on what the conditions are, precisely?

If two customers used the Internet to connect to two geographically separated seat reservation systems that are synchronised using OT then it is conceivable that both customers are given the same seat number. This is an example where centralisation and pessimistic locking is preferable.

I think of a centralised DB as a real device - an end point, perhaps at a particular URL that you can send messages to. By contrast a distributed DB managed with OT allows each site to have its own divergent copy of the data. This is ideal for some applications, but not for others.

In the seat reservation example there is a real plane and real people who will occupy the seats. Therefore a centralised DB at a known end point using pessimistic locking makes sense.

> > However, collaborative, decentralised management of a company's
> > geological data would be reasonable.
>
> I don't really understand this either. The only fleshed out OT protocol
> I've seen thus far concerns text files and rejects/annuls every
> transaction which has to do with the same file offset and symbol, within
> a network roundtrip time. A typical geological dataset would be composed
> of far more numerous data points, true, so per update write contention
> would be less. But normalizing for size, I would imagine that the
> dataset would be even more rigid and unforgiving against this sort of
> compensation, because it does not possess the global symmetry that a
> text file (an element of a string monoid) does. Instead, its rigidity,
> borne of structural/semantic asymmetry, would probably give rise to more
> edit conflicts and hence more annulled transactions per a granule of
> time than for a long string.

I'm not talking about performance, but rather the philosophical difference between "device" and "data". It's a question of asking whether divergence is permissible in the application. For example, it is useful to allow branching and merging of source code by developers working on the same software project, but not with aeroplane seat reservations.

> At least this is what happens for your average, structured document
> under the normal timestamped replication protocols. I've never seen any
> way around the problem other than to exploit higher level semantics.

Cheers,
David Received on Sat Dec 16 2006 - 06:33:45 CET

Original text of this message