Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 8 Dec 2006 18:35:15 -0800
Message-ID: <1165631715.879256.179690_at_f1g2000cwa.googlegroups.com>


Bob Badour wrote:
> David wrote:
>
> > Bob Badour wrote:

 [snip]

> > In any case, restricting yourself to transactions that are local to the
> > process managing the DB, can you outline a realistic example where
> > long-lived mutative transactions are necessary?
>
> Why should we restrict ourselves to a tiny special class of applications?

Did you actually read my last post? The "restriction" arises from an alternative, general approach to distributed computing.

> >>>Note that my premise is that concurrency is only needed for CPU
> >>>intensive tasks and that these only require shared read access.
> >>
> >>That is one of your assumptions that is false.
> >
> > Can you show that?
>
> With all due respect, your assumption is not reasonable on its face. If
> you think it is reasonable, the onus lies on you to demonstrate its
> reasonableness.

My claim can't easily be proven true, but it should easily be falsified if it turns out to be wrong. You haven't yet provided an example.

> Concurrency is especially needed for I/O bound tasks, and in fact
> freeing a CPU for other tasks while waiting for an I/O operation to
> complete is one of the major benefits of concurrency.

This is a good point. However in practise read/write caching trivially solves it. I have built a Persistent Object Store that (amortised) allows for many hundreds of thousands of (simple) mutative transactions per second. I don't see any reason for not achieving similar performance in an RDB.

This depends on the assumption that mutative operations have a tendency to work on state that is already cached in memory. If that is not the case, and a large fraction of transactions need to block on a read from disk (because the write patterns aren't localised), then concurrency won't help a great deal because the transaction throughput will be poor regardless of what you do. Note furthermore that making the writers block the readers can improve writer transaction throughput because the readers tend to upset cache locality for the writers.

[snip]

> >>If one loads any system beyond its capacity, it will exhibit
> >>pathological behaviour.
> >>The common term for this is "thrashing" where
> >>concurrent processes spend more time on overhead than actual work. It
> >>can happen in a lock manager. It can happen in a cache. It can happen in
> >>a virtual memory manager.
> >>
> >>All real computer systems have finite capacity.
> >
> > I don't find your generalisation useful. If faced with a choice I
> > would always pick a system that maintains a reasonable transaction
> > throughput rather than one that falls in a heap (all other things being
> > equal).
>
> Since you will never be faced with that choice and will always have to
> accept a system that eventually falls in a heap one way or another, I
> don't know what else to say. You can either accept reality or face the
> consequences of your delusions.

A system may be overloaded temporarily, and if it doesn't fall in a heap it may be able to quickly recover. That is useful.

[snip]

> >>What I'm saying is that if it's possible to add the whole family at a
> >>time (according to integrity constraints or atomicity requirements) then
> >>it would be silly to design the application to prevent it.
> >
> > We agree on that, but I don't think it's relevant to our discussion.
>
> Why then do you propose a silly design that prevents it?

I don't. I merely claim it is desirable (and in practise possible) to use fine-grained mutative transactions.

> >> Mutative changes should be applied in as small a transaction as
> >>
> >>>possible in order to promote concurrency and avoid dead-lock. That is
> >>>commonly discussed in books on RDB.
> >>
> >>I agree. At the physical level, the dbms should not hold shared
> >>resources any longer than absolutely necessary. I also suggest the dbms
> >>should not hold more shared resources than absolutely necessary.
> >
> > Note that with distributed transactions you may actually prefer to do a
> > reasonable amount in a transaction because of the significant
> > per-transaction overheads.
>
> Your statement is either a tautology or nonsense. I cannot be bothered
> exerting the effort to tell which.

My statement was quite simple: Distributed transactions (in contrast to in-process transactions) have large overheads and therefore issuing many fine-grained transactions instead of fewer coarse-grained transactions can degrade performance.

I don't like the whole idea of distributed transactions. Asynchronous message queues are a better basis for distributed computing.

[snip]

> > Note that speculative research that is prepared to throw caution to the
> > wind can sometimes (albeit rarely) yield great results.
>
> Once again, new hypotheses that simply ignore pasts observations get
> ignored. Until you offer any evidence that you have a clue about prior
> work, your hypothesis falls into this category. Old hypotheses that fail
> to predict new observations get discarded. You have not identified any
> such hypothesis.
>
> Thus, you have offered nothing useful.
>
> I have already given your proposal more time and attention than it
> merited. You now have a choice. You can learn enough of the prior work
> that you abandon your hypothesis or that you can offer a convincing
> argument why the prior work had everything all wrong. Or you can bounce
> off the bottom of the killfile with the other cranks who frequent this
> newsgroup.

You're spending your time and attention writing long-winded paragraphs like the above with zero information content.

Your only counter-arguments have been related to network or I/O latency. My research shows that both can be eliminated or amortised away from exclusive write transactions.

Cheers,
David Received on Sat Dec 09 2006 - 03:35:15 CET

Original text of this message