Re: Concurrency in an RDB

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Mon, 18 Dec 2006 13:02:22 +0200
Message-ID: <Pine.SOL.4.62.0612180916250.14418_at_kruuna.helsinki.fi>

On 2006-12-17, David wrote:

> In OT there is never a need to report failure. Transactions are
> guaranteed to succeed! A user makes a local modification on local
> state without any risk of a dirty read w.r.t the local database.

Yes, but in the general case there is a risk of dirty reads to remote state. It's being hidden under a symmetry assumption, which effectively says that whatever the protocol can produce under such circumstances must be a valid state of the distributed DB, but the restriction is still there. It shows up in that OT isn't generally applicable, and in that (the corrected version of) dOPT annuls some transactions, i.e. reduces to an OCC protocol which can only guarantee durability after it's completed a kind of validation round.

> I agree that with some effort it may be possible to increase
> performance. Note that if errors are common then this approach may not
> help much.

By parallelization, adaptive choice of the coding algorithm you use to trace faulty transactions (i.e. you try independent sets of compatible updates so that the probability of success is approximately 1/2) and temporal interleaving, I think you can get pretty much arbitrary throughput, with latency dominated by batch buildup.

> Your approach means that a transaction appears to commit even though
> in actual fact it may be discarded. i.e. it is breaking the
> durability assumption commonly expected on a DBMS. This will prevent
> it being used for distributed transactions. Do "open transactions"
> address this?

Yes. The locking protocol we're discussing prevents any action which would be incompatible with the pending compensation from proceeding. At the same time compatible, commutative processes benefit from a higher level of concurrency because they proceed independently. The point of SCC is that ACID properties hold semantically, eventhough they're relaxed with respect to low level syntax, and this is used to allow more concurrency.

> Is the flag for whether Bill has had surgery, a functional dependency
> on other state, and therefore doesn't deserve to be part of the data
> model?

No. For example, a receptionist might want to (weakly) authenticate Bill based on some of the shared knowledge they have on Bill's history. After that, the knowledge that he's had surgery is previously unknown to the system and couldn't have been deduced from its contents. It's just that the new DB state after the update is jointly dependent on both novel input *and* the previous state, and the precise dependency is unknown to the DB because it's being implemented by an ad hoc human user.

> Functionally dependent state can be very expensive to calculate, and
> may need to persist in order to avoid unnecessary recalculation. But
> I regard that as a caching issue.

Certainly, but in those terms we're just talking cache coherency. That's basically the same issue as before: how to guarantee distributed atomicity and consistency.

>> In your paradigm you'd probably want to make such updates idempotent.
>> But that's risky, because it calls for precision that human record
>> keepers simply don't have, and it limits the kind of ad hoc work that is
>> often necessary in such business.
>
> I'm not sure what you mean here. Assignment (with the same value) is
> idempotent.

Assignment is, yes, but people are exceedingly bad at certain kinds of complicated assignment. For example, Bill's surgeries might be modelled as a set of timestamped events. Then you have a choice between set assignment and if-exists-then-nop-else-insert style inclusion. If you let people use such primitives, they're going to mess up the database because they're bad at entering precise facts. The only way you're going to have any hope of not duplicating some of those timestamps in the long run is if you let people check which dates are already there, and proceed from there. That calls for remote modifications and locking.

> 2. only employ simple, local integrity constraints compatible with
> fast transactions that don't need to block on network I/O.
>
> In the second case the system is far more robust because
> "inconsistencies" between geographically separated sites is explicitly
> allowed for in the design. Furthermore, the parts of the system are
> more autonomous (and asynchronous).

Well, sure, but now you're once again assuming the problem away. Locking and integrity constraints are means of managing complexity in data and update patterns. You can't really avoid them if your data and access patterns happen to require them. If you try, you'll be limiting the applicability of your system, and that makes the system far less valuable to the user. Often in O&M, the database and the network environment are there *precisely* so that one station can interface with another across the physical production line, and make decisions based on distributed input. The data processing side is there to manage the complexity, permit feedback control with large numbers of variables, provide possibilities for different synchronization policies so that lockstep operation and large buffers within the production line can be avoided simultaneously, to provide modularity, flexibility and configurability, and so on. That's what the production process needs, so that's what you'll have to provide. In O&M, it's the line designers who're making all the money, not the DB guy, so you don't really get to manage the DB environment any which way.

> Note as well that a distributed integrity constraint may indicate a
> denormalised design. Can the redundancy be avoided?

What redundancy? The only part in my example that could possibly be redundant are the production statistics. They're neither essential to my argument, nor can you always avoid them even if redundant. I mean, we've already gone over the benefits of isolating long running read transactions from the update workload. You can't do that unless you have a redundant copy of the data somewhere for the reads to use.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Mon Dec 18 2006 - 12:02:22 CET

This message: [ Message body ]
Next message: Sampo Syreeni: "Re: Concurrency in an RDB"
Previous message: David: "Re: Concurrency in an RDB"
In reply to David: "Re: Concurrency in an RDB"
Next in thread: Neo: "Re: Generalised approach to storing address details"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message