Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 15 Dec 2006 17:36:39 -0800
Message-ID: <1166232999.680303.197770_at_t46g2000cwa.googlegroups.com>


Sampo Syreeni wrote:
> On 2006-12-15, David wrote:
>
> > By contrast I claim that mutative work, by its very nature is not
> > "algorithmic" (CPU intensive) and therefore, as long as I/O or network
> > latency can be avoided, it is possible for a computer to easily
> > generate enough data changes to saturate the writing to disk.
>
> If the only limiting factor for the length of read-write dependencies in
> transactions is CPU capacity, so that disk, network, memory and other
> latencies can be ignored, and the loading isn't heavy enough to require
> parallelization for CPU capacity, then this follows. I just can't
> imagine a well-designed database application where this would be the
> case -- in a fully normalized database all data are maximally
> independent of each other, so you won't be changing one value solely
> based on another.

In a fully normalized database surely it makes it more not less likely for mutative transactions to be simple and fine-grained. Do I misunderstand you?

> The only viable update transaction remaining after
> that is of the fine-grained spoolable kind. But where are all of these
> constraints going to be satisfied simultaneously? Not in any OLTP
> environment I've ever heard of.

I agree that a (lack of) strong integrity constraints is a critical assumption that I make. As it turns out, Operational Transform also implies relatively weak integrity constraints.

In a fashion, my question really is this: Is it reasonable in a large class of RM applications to have weak enough integrity constaints such that mutative work can be done with minimal CPU load?

Here is a non-relational example showing that strong integrity constraints can be lousy. Imagine a DB that stores source code, and there is an integrity constraint that the source must all compile successfully. This will make mutative transactions *very* expensive - so don't expect a high transaction throughput. Furthermore this integrity constraint hurts concurrency because every mutative transaction will need to lock the entire source as it performs a build.  Worse still, with strict 2PL many transactions will dead-lock.

With more relaxed integrity constraints you get to the point that mutative transactions can be very fast. In fact, so fast that a complex locking system reduces performance.

> In OLAP environments we can have something similar. But then we have
> update strategies available that are even better than yours. For
> example, take a look at Stonebraker et al's C-store, or the various bulk
> materialized view maintenance algorithms out there. The basic idea is to
> separate long running reads to a second copy, partition write
> transactions into sufficiently many mutually disjoint working sets (easy
> when they're guaranteed to be single row upserts, which are a prominent
> example of the kind of fine-grained spoolable transaction you're
> probably conceptualizing this with), serialize the transactions within
> each set, run them without locking, then parallel sort the resulting
> updates in physical order, merge sweep them in batch to the read
> partitions, and finally atomically swap the updated versions into use.
>
> Unfortunately the only reason such strategies work is because the very
> first OLTP database in the chain was able to handle the general case, so
> that the bulk of the serialization work is already done.

I agree that avoiding contention between readers and writers is very useful. I presume that a reader that takes an excessive amount of time can appear rather outdated by the time it returns.

> >>> I claim that virtually all CPU intensive tasks fit well into the idea
> >>> of calculation of outputs on read only inputs. For example a query on
> >>> a set of tables used to extract information doesn't modify any of the
> >>> (persistent) tables. The query "Find the grandparents of people who
> >>> have a brother in law that had open heart surgery before 2001" has to
> >>> examine many database records.
> >>>
> >>> By contrast, mutative changes to an RDB tend to be very simple and
> >>> don't require much CPU effort.
> >>
> >> This generalization is sufficiently broad as to be simply false. Some
> >> writes are simple; some are not.
> >
> > Please outline an example. That is why I posted!
>
> Try your example transaction, with the result cached for reuse. It makes
> sense precisely when the query load is read-mostly and complex queries
> recur.

Sorry, I don't understand you.

> >>> For example
> >>>
> >>> "Add person Bill Heard born in 1970, who married Janet Shaw in
> >>> 1996"
> >>>
> >>> Furthermore they don't depend on the results of time consuming read
> >>> only queries.
> >>
> >> Again, this is true of some applications but not of others.
> >
> > If you have experience with that I would be interested.
>
> How about "Gimme everything you have on Bill. [For thirty seconds, draw
> forms and wait for the user to generate some unpredictable data
> dependencies.] Change Bill's wife to Mary Long."?

Is this a user who analyses info on Bill before deciding to change Bill's wife? Isn't the change to Bill's wife a separate txn?

> Or "Give me
> manufacturing details on part x. [Check whether it or any of its
> subparts were marked defective by the preceding QA stations, which run
> quasisynchronously. If so, route to trash.] Mark as disposed of. [Else
> route to packaging.] Mark as routed to packaging and update output
> statistics."?

If I understand you this has complex, distributed integrity constraints, which I think is a bad idea. What happens if a QA station crashes and rolls back to an earlier state?

Cheers,
David Received on Sat Dec 16 2006 - 02:36:39 CET

Original text of this message