| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Concurrency in an RDB
I have some thoughts on how best to achieve concurrency in an RDB. You
may want to keep in mind that I am a "systems programmer" with
little experience in using an RDB. I'm posting to this newsgroup to
see whether my ideas have a reasonable basis.
Consider that a single mutex is used to protect an entire RDB. This mutex offers shared read / exclusive write access modes. It avoids writer starvation by blocking further readers once one or more writer threads are waiting on the mutex.
I'm basically suggesting conservative rather than strict 2PL. This may sound heretical, or at least far too simplistic (and inefficient) to be useful in an industrial strength RDB. However I contend that the approach deserves to me analyzed in some detail. I suspect it is a case of "less is more".
With such coarse level locking the CPU overheads of gaining and releasing locks are minimized (simply because locks are opened so infrequently).
Exclusive write access to the entire RDB means there can never be dead-lock. This eliminates a lot of complexity and overhead. For example there is no need for dead-lock detection such as by looking for a cycle in the wait-for graph. There is also no need to select a "victim" transaction, abort it and then later restart it from scratch.
In some database applications repeated dead-lock scenarios occur, and the database can become very inefficient because transactions are continually aborted and rolled back. This is a rather insidious problem because a database can work fine under lighter loads then suddenly takes on pathological behavior when the load increases slightly.
In a shared read mode we get the ultimate in concurrency. We allow any number of threads to have unrestricted access to all data in the database, without any additional locking (i.e. apart from the initial lock). It is fairly easy to support shared use of caches with LRU eviction policies. Data loaded from disk represents cached state, as well as all calculated (or "derived") state. So it is easy to allow different threads to share in the calculated results that they generate. This makes it quite easily to parallelise read-only algorithms across all processors in multi-core machines. It also makes it easy to ensure that processors work concurrently with I/O.
I like the analogy to functional programming. This emphasizes the idea that outputs are (completely) calculated from read-only inputs. It is well known that FP offers one of the best ways to achieve concurrency in computer science.
I claim that virtually all CPU intensive tasks fit well into the idea of calculation of outputs on read only inputs. For example a query on a set of tables used to extract information doesn't modify any of the (persistent) tables. The query "Find the grandparents of people who have a brother in law that had open heart surgery before 2001" has to examine many database records.
By contrast, mutative changes to an RDB tend to be very simple and don't require much CPU effort. For example
"Add person Bill Heard born in 1970, who married Janet Shaw in 1996"
Furthermore they don't depend on the results of time consuming read only queries. If they did then the ingestion rate of the RDB would be very poor. Note that this limits the allowable complexity of the DB integrity constraints.
Subject to integrity constraints, mutative work can be fine grained. For example, it is not necessary to add a whole family at once to a DB; it is possible to add one person at a time.
Even though the changes to the database are simple and can be performed quickly, they may not be well localized. When a RDB makes use of very fine grained predicates (i.e. favors lots of tables with few columns rather than the reverse), adding some information atomically to the database can require updates to many tables and secondary indexes. Therefore an exclusive write lock on the entire RDB is well suited.
This computing model involves two "phases".
A note to OO programmers: The pitfalls and inefficiencies often associated with the observer pattern are solved by this computing model. Derived values are only recalculated as often as they're needed, not according to when the inputs change (which can be far more often). Also the problems of dirty reads and reentrancy problems are eliminated.
Cheers,
David Barrett-Lennard
Received on Thu Dec 07 2006 - 20:14:17 CST
![]() |
![]() |