Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 15 Dec 2006 02:41:25 -0800
Message-ID: <1166179285.677999.61840_at_j72g2000cwa.googlegroups.com>


Marshall wrote:
> On Dec 7, 6:14 pm, "David" <davi..._at_iinet.net.au> wrote:
> >
> > Consider that a single mutex is used to protect an entire RDB.
>
> *cough*
>
>
> > This
> > mutex offers shared read / exclusive write access modes. It avoids
> > writer starvation by blocking further readers once one or more writer
> > threads are waiting on the mutex.
>
> If you have shared read / exclusive write, then you have no mechanism
> for controlling what database state(s) an application may see in
> a single transaction.
>
> Consider:
> begin read transaction 1
> perform write transaction 2
> another read in transaction 1
> perform write transaction 3
> read and finish transaction 1
>
> Writers didn't block readers, but did block each other.
> Deadlock is impossible. However, transaction 1 has seen
> three different database states.
>
> If you can only control writers, you can't have
> atomic reads. Sometimes you must have atomic reads.

You misunderstand my proposal. Readers/Writers are mutually exclusive. When a reader opens a lock, other readers are also allowed to obtain a lock but no writer is allowed access until the last reader releases the lock.

When a writer opens the lock, no other thread (reader or writer) can open the lock.

[further discussion based on misunderstanding snipped]

> > In some database applications repeated dead-lock scenarios occur, and
> > the database can become very inefficient because transactions are
> > continually aborted and rolled back. This is a rather insidious
> > problem because a database can work fine under lighter loads then
> > suddenly takes on pathological behavior when the load increases
> > slightly.
> >
> > In a shared read mode we get the ultimate in concurrency.
>
> Um, no. For example, in a simple scenario where two processes
> were writing regularly each to their own table, under your system
> their would be contention, but under something even so simple
> as table level locking, there would not be.

I mean *during* the shared read mode we get the ultimate in concurrency.

Yes, of course writer threads enjoy no concurrency at all. That is exactly my proposal. I claim that with a careful design, you don't need concurrency amongst writers, because that is not the bottleneck. This in turn depends on some assumptions that I make.

> > We allow any
> > number of threads to have unrestricted access to all data in the
> > database, without any additional locking (i.e. apart from the initial
> > lock). It is fairly easy to support shared use of caches with LRU
> > eviction policies. Data loaded from disk represents cached state, as
> > well as all calculated (or "derived") state. So it is easy to
> > allow different threads to share in the calculated results that they
> > generate. This makes it quite easily to parallelise read-only
> > algorithms across all processors in multi-core machines. It also makes
> > it easy to ensure that processors work concurrently with I/O.
>
> Parallelizing read-only algorithms isn't a big accomplishment.

This is exactly my point - it is very easy to get all cores in a multi-core working hard when they gain shared read access to the database.

By contrast I claim that mutative work, by its very nature is not "algorithmic" (CPU intensive) and therefore, as long as I/O or network latency can be avoided, it is possible for a computer to easily generate enough data changes to saturate the writing to disk.

> > I like the analogy to functional programming. This emphasizes the idea
> > that outputs are (completely) calculated from read-only inputs. It is
> > well known that FP offers one of the best ways to achieve concurrency
> > in computer science.
>
> I'm an admirer of functional programming, yet I have a hard time
> with the above claim. Especially the "well known" part. Care to cite?

What I mean is the FP style by its very nature tells us where there are opportunities for parallelism. Eg

y = f(x) + g(x) * h(x)

f(x), g(x), h(x) can all be calculated in parallel.

Note however that if f,g,h have side effects (ie not pure FP) then parallelism can't be assumed. It is also harder to reason about the program.

> > I claim that virtually all CPU intensive tasks fit well into the idea
> > of calculation of outputs on read only inputs. For example a query on
> > a set of tables used to extract information doesn't modify any of the
> > (persistent) tables. The query "Find the grandparents of people who
> > have a brother in law that had open heart surgery before 2001" has to
> > examine many database records.
> >
> > By contrast, mutative changes to an RDB tend to be very simple and
> > don't require much CPU effort.
>
> This generalization is sufficiently broad as to be simply false. Some
> writes are simple; some are not.

Please outline an example. That is why I posted!

> > For example
> >
> > "Add person Bill Heard born in 1970, who married Janet Shaw in
> > 1996"
> >
> > Furthermore they don't depend on the results of time consuming read
> > only queries.
>
> Again, this is true of some applications but not of others.

If you have experience with that I would be interested.

[further discussion based on misunderstanding of locking snipped]

Cheers,
David Received on Fri Dec 15 2006 - 11:41:25 CET

Original text of this message