Re: Concurrency in an RDB

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Mon, 18 Dec 2006 17:28:03 +0200
Message-ID: <Pine.SOL.4.62.0612181517090.22950_at_kruuna.helsinki.fi>


On 2006-12-18, David wrote:

> Consider that 50% of the individual transactions break the constraint.
> Then batching won't help. Eg try a batch of size 2. Expectation
> value of number of compiles per batch = 0.25*1 + 0.25*2 + 0.5*3 =
> 2.25, which is actually worse than not batching at all.

Under the rule I gave you, you'd test both transactions serially so you won't outright lose anything. But the example does prove my thinking wrong.

I'd say the gain isn't related to batch size, but the amount of zeroth order statistical redundancy in the faulty-ok bitstream. It's also not going to scale past 1/2 probability of error, because an oracle for any one of a set of transactions being okay is not available. In practice the approach will still be usable: most transactions are correct. You'll just have to derive the rest of the throughput by throwing hardware at the problem; thankfully it's still embarrasingly parallel. Then latency becomes compute bound.

> I thought in practise a good DB application would never keep
> transactions open while dialogs are displayed for accepting user
> input.

They often do. That's really the so called long running transaction problem in a nutshell: if the DBMS doesn't understand application semantics, all it can do to preserve integrity is to note which data the application has seen, assume the worst case that the entire write set could be dependent on the entire read one, and so to enforce serializability within the whole transaction read-write set, no matter how large or long lived.

> However, IMO if the patient information relates more to "data" than
> "device" then it is better to use an asynchronous data entry model and
> provide powerful data validation based on read only queries that pick
> up duplicate data entry errors etc.

Then you're actually not going with locking or even optimism but with open transactions: you can't be sure the data you just entered will *really* stick until after the validation has run, and succeeded.

> Branching, merging tagging and so on can be used to distinguish a
> valid DB from a work in progress.

In theory, yes, but in practice you're then committing to maintaining enough context (in the DB) and expertise (around it) to merge the data after the fact, instead of forcing the DB to go from one valid state to another and aiming for maximal concurrency achievable within that constraint. From some personal experience with Lotus Notes, I can tell you that the result can be painful.

Secondly, it's reasonable that you should be able to exploit semantics for concurrency and performance when known. But you seem to be advocating at least one protocol which *requires* the availability of semantic knowledge, so that merging can be automated. To me that seems more like ethics than engineering.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Mon Dec 18 2006 - 16:28:03 CET

Original text of this message