Re: Concurrency in an RDB

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Sat, 16 Dec 2006 18:39:32 +0200
Message-ID: <Pine.SOL.4.62.0612161808460.19722_at_kruuna.helsinki.fi>


On 2006-12-15, David wrote:

> In a fully normalized database surely it makes it more not less likely
> for mutative transactions to be simple and fine-grained.

Correct, and I think I'm finally getting your point. You seem to be viewing transactions at a somewhat higher level than common DBMSs. From the latter point of view, we commonly expect even the simplest updates to be of the read-modify remotely-write kind. Your viewpoint seems to be that if at all possible, the transaction should consist of passing the new value to the DBMS, which then executes the transaction locally and reports success or failure. In the former case, there's a data dependency between the read and the write, and so you cannot avoid locking the data for extended periods of time, except perhaps by optimism. In the second case the transaction executes entirely inside the DBMS and the read-write latency is entirely determined by how fast the CPU can handle it; when you force the transaction to operate blindly (you say "on constant data"), you also make it spoolable, which of course makes it easy on the locking subsystem.

I do understand the point, and when you can optimize for such constraints, the result can be very efficient. However, I think the important part is the principle that governs the interface, not the DBMS internals where the interfaces are so limited that you can always optimize for spoolability, or in fact CPU loading. I mean, there are still lots of applications where you cannot expect to be working blind or where the business logic cannot be pushed inside the DBMS the way the design would require. Again, it's this more general case DBMS designers are designing for. Sometimes you might be able to do with less and it might even make sense to switch to using coarser grained locks when your assumptions are fulfilled, but if you count on this, the DBMS is bound to become unstable when the first longer running transaction comes about.

> Here is a non-relational example showing that strong integrity
> constraints can be lousy. Imagine a DB that stores source code, and
> there is an integrity constraint that the source must all compile
> successfully. This will make mutative transactions *very* expensive -
> so don't expect a high transaction throughput.

Not necessarily true. You could batch the updates and leave the update transactions pending for a longer time, or go with open transactions and compensate by reporting failure later on. After this many modifications can be combined and most of the time the compilation will succeed. When it doesn't, you can you traitor tracing search for the precise transactions that caused the failure, roll them back, and retry.

>> Try your example transaction, with the result cached for reuse. It makes
>> sense precisely when the query load is read-mostly and complex queries
>> recur.
>
> Sorry, I don't understand you.

"Find the grandparents of people who have a brother in law that had open heart surgery before 2001." As such it's a relatively heavy read transaction. If you also want to store the result back into the database, it becomes a long running read-write transaction. The only way you're going to get any concurrency in that case is if you use finer grained locks or scale back on integrity guarantees.

>> How about "Gimme everything you have on Bill. [For thirty seconds, draw
>> forms and wait for the user to generate some unpredictable data
>> dependencies.] Change Bill's wife to Mary Long."?
>
> Is this a user who analyses info on Bill before deciding to change
> Bill's wife?

Yes.

>Isn't the change to Bill's wife a separate txn?

Not necessarily, no. But the example is going to get contrived, so let's try another one. "Gimme everything medical you have on Bill. [Checks for inconsistensies with paper documentation.] Mark Bill as having had surgery." In this case you can't easily avoid locking because whether you want to perform the modification does depend on whether Bill has already been so marked. Not locking could lead to inconsistency if somebody modified Bill's patient record in the mean time.

In your paradigm you'd probably want to make such updates idempotent. But that's risky, because it calls for precision that human record keepers simply don't have, and it limits the kind of ad hoc work that is often necessary in such business.

> If I understand you this has complex, distributed integrity
> constraints, which I think is a bad idea. What happens if a QA station
> crashes and rolls back to an earlier state?

You're going to mark that batch as being in an unknown condition, or sometimes you might have a pickup line to recycle items affected by such failure back to the beginning of the QA line. Still, this is a very typical O&M scenario.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Sat Dec 16 2006 - 17:39:32 CET

Original text of this message