Re: Concurrency in an RDB

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Fri, 15 Dec 2006 17:50:08 +0200
Message-ID: <Pine.SOL.4.62.0612151459340.6274_at_kruuna.helsinki.fi>

On 2006-12-15, David wrote:

> By contrast I claim that mutative work, by its very nature is not
> "algorithmic" (CPU intensive) and therefore, as long as I/O or network
> latency can be avoided, it is possible for a computer to easily
> generate enough data changes to saturate the writing to disk.

If the only limiting factor for the length of read-write dependencies in transactions is CPU capacity, so that disk, network, memory and other latencies can be ignored, and the loading isn't heavy enough to require parallelization for CPU capacity, then this follows. I just can't imagine a well-designed database application where this would be the case -- in a fully normalized database all data are maximally independent of each other, so you won't be changing one value solely based on another. The only viable update transaction remaining after that is of the fine-grained spoolable kind. But where are all of these constraints going to be satisfied simultaneously? Not in any OLTP environment I've ever heard of.

In OLAP environments we can have something similar. But then we have update strategies available that are even better than yours. For example, take a look at Stonebraker et al's C-store, or the various bulk materialized view maintenance algorithms out there. The basic idea is to separate long running reads to a second copy, partition write transactions into sufficiently many mutually disjoint working sets (easy when they're guaranteed to be single row upserts, which are a prominent example of the kind of fine-grained spoolable transaction you're probably conceptualizing this with), serialize the transactions within each set, run them without locking, then parallel sort the resulting updates in physical order, merge sweep them in batch to the read partitions, and finally atomically swap the updated versions into use.

Unfortunately the only reason such strategies work is because the very first OLTP database in the chain was able to handle the general case, so that the bulk of the serialization work is already done.

>>> I claim that virtually all CPU intensive tasks fit well into the idea
>>> of calculation of outputs on read only inputs. For example a query on
>>> a set of tables used to extract information doesn't modify any of the
>>> (persistent) tables. The query "Find the grandparents of people who
>>> have a brother in law that had open heart surgery before 2001" has to
>>> examine many database records.
>>>
>>> By contrast, mutative changes to an RDB tend to be very simple and
>>> don't require much CPU effort.
>>
>> This generalization is sufficiently broad as to be simply false. Some
>> writes are simple; some are not.
>
> Please outline an example. That is why I posted!

Try your example transaction, with the result cached for reuse. It makes sense precisely when the query load is read-mostly and complex queries recur.

>>> For example
>>>
>>> "Add person Bill Heard born in 1970, who married Janet Shaw in
>>> 1996"
>>>
>>> Furthermore they don't depend on the results of time consuming read
>>> only queries.
>>
>> Again, this is true of some applications but not of others.
>
> If you have experience with that I would be interested.

How about "Gimme everything you have on Bill. [For thirty seconds, draw forms and wait for the user to generate some unpredictable data dependencies.] Change Bill's wife to Mary Long."? Or "Give me manufacturing details on part x. [Check whether it or any of its subparts were marked defective by the preceding QA stations, which run quasisynchronously. If so, route to trash.] Mark as disposed of. [Else route to packaging.] Mark as routed to packaging and update output statistics."?

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Fri Dec 15 2006 - 16:50:08 CET

This message: [ Message body ]
Next message: Bob Badour: "Re: What Actually Causes Deadlock"
Previous message: dawn: "Re: What Actually Causes Deadlock"
In reply to: David: "Re: Concurrency in an RDB"
In reply to Marshall: "Re: Concurrency in an RDB"
Next in thread: David: "Re: Concurrency in an RDB"
Reply: David: "Re: Concurrency in an RDB"
Reply: Sampo Syreeni: "Re: Concurrency in an RDB"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message