Re: Concurrency in an RDB

From: Marshall <marshall.spight_at_gmail.com>
Date: 13 Dec 2006 18:27:43 -0800
Message-ID: <1166063263.095450.22030_at_73g2000cwn.googlegroups.com>


On Dec 7, 6:14 pm, "David" <davi..._at_iinet.net.au> wrote:
>
> Consider that a single mutex is used to protect an entire RDB.

*cough*

> This
> mutex offers shared read / exclusive write access modes. It avoids
> writer starvation by blocking further readers once one or more writer
> threads are waiting on the mutex.

If you have shared read / exclusive write, then you have no mechanism for controlling what database state(s) an application may see in a single transaction.

Consider:
begin read transaction 1
perform write transaction 2
another read in transaction 1
perform write transaction 3
read and finish transaction 1

Writers didn't block readers, but did block each other. Deadlock is impossible. However, transaction 1 has seen three different database states.

If you can only control writers, you can't have atomic reads. Sometimes you must have atomic reads.

> I'm basically suggesting conservative rather than strict 2PL. This
> may sound heretical, or at least far too simplistic (and inefficient)
> to be useful in an industrial strength RDB. However I contend that
> the approach deserves to me analyzed in some detail. I suspect it is a
> case of "less is more".

It might be useful for some narrow class of applications that don't need atomic reads. I can't imagine under what circumstances one would have shared data structures and *not* need atomic reads, but who knows? Maybe there are data structures that receive of a bunch of small writes, and are only read later in some offline process. Log files?

It is clearly not sufficient for a general purpose data manager, however.

> With such coarse level locking the CPU overheads of gaining and
> releasing locks are minimized (simply because locks are opened so
> infrequently).
>
> Exclusive write access to the entire RDB means there can never be
> dead-lock. This eliminates a lot of complexity and overhead. For
> example there is no need for dead-lock detection such as by looking for
> a cycle in the wait-for graph. There is also no need to select a
> "victim" transaction, abort it and then later restart it from
> scratch.

I believe the qualities you describe (deadlock freedom, simpler implementation) would in fact obtain. However the inability to observe a consistent database state seems a severe price; not worth it from my point of view.

> In some database applications repeated dead-lock scenarios occur, and
> the database can become very inefficient because transactions are
> continually aborted and rolled back. This is a rather insidious
> problem because a database can work fine under lighter loads then
> suddenly takes on pathological behavior when the load increases
> slightly.
>
> In a shared read mode we get the ultimate in concurrency.

Um, no. For example, in a simple scenario where two processes were writing regularly each to their own table, under your system their would be contention, but under something even so simple as table level locking, there would not be.

> We allow any
> number of threads to have unrestricted access to all data in the
> database, without any additional locking (i.e. apart from the initial
> lock). It is fairly easy to support shared use of caches with LRU
> eviction policies. Data loaded from disk represents cached state, as
> well as all calculated (or "derived") state. So it is easy to
> allow different threads to share in the calculated results that they
> generate. This makes it quite easily to parallelise read-only
> algorithms across all processors in multi-core machines. It also makes
> it easy to ensure that processors work concurrently with I/O.

Parallelizing read-only algorithms isn't a big accomplishment.

> I like the analogy to functional programming. This emphasizes the idea
> that outputs are (completely) calculated from read-only inputs. It is
> well known that FP offers one of the best ways to achieve concurrency
> in computer science.

I'm an admirer of functional programming, yet I have a hard time with the above claim. Especially the "well known" part. Care to cite?

> I claim that virtually all CPU intensive tasks fit well into the idea
> of calculation of outputs on read only inputs. For example a query on
> a set of tables used to extract information doesn't modify any of the
> (persistent) tables. The query "Find the grandparents of people who
> have a brother in law that had open heart surgery before 2001" has to
> examine many database records.
>
> By contrast, mutative changes to an RDB tend to be very simple and
> don't require much CPU effort.

This generalization is sufficiently broad as to be simply false. Some writes are simple; some are not.

> For example
>
> "Add person Bill Heard born in 1970, who married Janet Shaw in
> 1996"
>
> Furthermore they don't depend on the results of time consuming read
> only queries.

Again, this is true of some applications but not of others.

Anyway, this idea is sufficiently limited in scope as to be unworthy of attention for anyone building a general purpose system. Lack of atomic reads is a deal breaker. It's possible it might be useful for some narrow set of applications, but it would be some work to identify how to characterize them.

Marshall Received on Thu Dec 14 2006 - 03:27:43 CET

Original text of this message