Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 8 Dec 2006 19:40:25 -0800
Message-ID: <1165635625.668261.208540_at_j44g2000cwa.googlegroups.com>


Michael Ortega-Binderberger wrote:
> Hi, Sorry, but found this thread rather amusing, adding some
> comments in between below.
>
> On Fri, 8 Dec 2006, Bob Badour wrote:
>
> > David wrote:
> >
> >> Bob Badour wrote:

[snip]

> >> > In any case, restricting yourself to transactions that are local to the
> >> process managing the DB, can you outline a realistic example where
> >> long-lived mutative transactions are necessary?
> >
> > Why should we restrict ourselves to a tiny special class of applications?
>
> Deos not matter, whatever the application writer wants to
> do. If long write-transactions are called for, thats ok. No
> need to restrict ourselves.

Please outline a realistic example where the model calls for long write transactions.

[snip]
> > Concurrency is especially needed for I/O bound tasks, and in fact freeing a
> > CPU for other tasks while waiting for an I/O operation to complete is one of
> > the major benefits of concurrency.
>
> I comlpetely agree, the notion that one mutex RDB wide is
> good is simply, well, too simplistic. I agree that this
> approach has been rejected long ago.
> Concurrency is needed at the IO level, but also at the
> overall logical schema level. In your proposal, you lock
> the whole db exclusively for a write (which you assume
> incorrectly costs very little, writes cost-a lot).

Not necessarily. If

1.	Write caching is available and the system is not write bound
2.	state to be modified is already resident in memory
3.	The mutative transaction is fine grained

then a write transaction can be done *very* quickly (~10 usec), including the lock of the DB wide mutex (~1usec).

> Say why is this needed? Why not better exclusively lock only
> the taxle which you will modify? ie have a mutex per table.
> Surely this increases concurrency even more per your
> definition.

But concurrency amongst writer threads is not generally a bottleneck because I assume that writer threads are very fine grained and only lock the database for extremely short periods.

> Now writers can proceed concurrently as long as
> they write to different tables. This is clearly superior to
> your original proposal per your measure of concurrency.
> It is also completely unworkable. It will easily lead to
> inconsistencies in the database. So lets come up with
> another approach, one mutex like before for every table and
> one "super mutex" for the whole RDB (using here mutex in a
> misleading way, systems programers tend not to
> understand differences between locks and latches).
> So lets suppose I lock the table in exclusive mode when I
> want to change it, but hot to keep things ok? lock the new
> "super mutex" in a new mode: I'm planning to change an
> underlying table. So you start to have more lock modes, and
> more concurrency.
> I won't spend the time here explaining the details how all
> this works. Just think about it a bit and see how this
> can improve concurrency and still keep things ok. Once you
> have thought about it, you will see its far more complicated
> than your original proposal, but also has more concurrency.

More concurrency, less performance!

[snip]

> >> I don't find your generalisation useful. If faced with a choice I
> >> would always pick a system that maintains a reasonable transaction
> >> throughput rather than one that falls in a heap (all other things being
> >> equal).
> >
> > Since you will never be faced with that choice and will always have to accept
> > a system that eventually falls in a heap one way or another, I don't know
> > what else to say. You can either accept reality or face the consequences of
> > your delusions.
>
> Agreed, all systems have limits, when pushed, the system
> will either reject servicing new requests, or degrade
> perforwance, too bad.

Graceful degradation is desirable.

> I agree, opinions count for little, you need to know the
> facts, and the facts are that since you cannot
> quantitatively prove that you single mutex is better than
> what all big db vendors have implemented, and you confess
> you're no db guru, then your opinion counts little.

Unless I'm right.

> Again agreed. The relational model works best by doing set
> based operations. Actually, in a read db, the cost of adding
> say 5 family members in one set operation/transaction,
> compared to 5 different tranactions to add 1 at a time, is
> very different. Adding one at a time is waay slower,
> probably more than 4 times as slow as adding all 5 in one
> transaction. Why? look at how logging works and how the
> logs have to be hard flushed to disk before concluding the
> transaction. Don't know what this means ? Go read the
> bibliography.

I'm very familiar with WAL, and the ARIES algorithm and its implementation.

You assume durability implies the log has to be flushed to disk on each transaction. That depends on the hardware and the durability requirements for the application.

In the Persistent Object Store I have built, there is minimal overhead (~1us) in using extremely fine-grained transactions. I implemented a Log Structured Store and don't flush the log after every transaction. I achieve write performance that blows a WAL based system out of the water.

[snip]

> By the way, you mentioned multi branch-distributed - text
> editing oriented databases that sync their content so people
> can edit the same document in a distributed fashion:
> 2 answers:
> 1) look at clearcase (has replicated dbs, branching, auto
> merging, its text oriented (supperts binary too, just in
> case), has extesive branching...) Its also a pain to use,
> but don't ignore prior art.
> 2) look at google docs, their wrietely acquisition. I had
> thought about doing something lite that, they actually did
> it. I wish I would have been port of that team.

AFAIK neither use operational transform.

Cheers,
David Barrett-Lennard Received on Sat Dec 09 2006 - 04:40:25 CET

Original text of this message