Re: Concurrency in an RDB

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Fri, 08 Dec 2006 16:23:26 GMT
Message-ID: <2kgeh.30158$cz.453890_at_ursa-nb00s0.nbnet.nb.ca>


David wrote:

> Bob Badour wrote:
> 

>>David wrote:
>>
>>>Bob Badour wrote:
>>>
>>>
>>>>David wrote:
>>>>
>>>>>I have some thoughts on how best to achieve concurrency in an RDB. You
>>>>>may want to keep in mind that I am a "systems programmer" with
>>>>>little experience in using an RDB. I'm posting to this newsgroup to
>>>>>see whether my ideas have a reasonable basis.
>>>>>
>>>>>Consider that a single mutex is used to protect an entire RDB. This
>>>>>mutex offers shared read / exclusive write access modes. It avoids
>>>>>writer starvation by blocking further readers once one or more writer
>>>>>threads are waiting on the mutex.
>>>>
>>>>Some write transactions take a long time to complete and will thus lock
>>>>everyone else out of the database.
>>>
>>>Can you outline for me a real life example where long lived mutative
>>>transactions are necessary?
>>
>>In some places, 200 microseconds is too long:
>>http://www.embeddedstar.com/press/content/2003/12/embedded11970.html
>>
>>Are you suggesting that it is possible to acquire and release a
>>universal exclusive lock over a distributed system in less than 200
>>microseconds?
>>
>>It takes 500 times as long as that to ping my local ISP.
>
[snip]
> > 
> In any case, restricting yourself to transactions that are local to the
> process managing the DB, can you outline a realistic example where
> long-lived mutative transactions are necessary?

Why should we restrict ourselves to a tiny special class of applications?

>>>>[snip]
>>>>
>>>>>Exclusive write access to the entire RDB means there can never be
>>>>>dead-lock. This eliminates a lot of complexity and overhead.
>>>>
>>>>It also eliminates almost all useful concurrency.
>>>
>>>Note that my premise is that concurrency is only needed for CPU
>>>intensive tasks and that these only require shared read access.
>>
>>That is one of your assumptions that is false.
>
> Can you show that?

With all due respect, your assumption is not reasonable on its face. If you think it is reasonable, the onus lies on you to demonstrate its reasonableness.

Concurrency is especially needed for I/O bound tasks, and in fact freeing a CPU for other tasks while waiting for an I/O operation to complete is one of the major benefits of concurrency.

>> If
>>
>>>that is the case then there is plenty of concurrency available during
>>>shared read modes. Exclusive write access takes so little time that it
>>>can be neglected.
>>
>>That is a second of your assumptions that is false.
>
> See above

Ditto.

>>>>>In some database applications repeated dead-lock scenarios occur, and
>>>>>the database can become very inefficient because transactions are
>>>>>continually aborted and rolled back.
>>>>
>>>>Which applications are those? And why are dead-locks necessarily a
>>>>problem for those applications?
>>>
>>>I don't have the RDB experience to know how often and to what extent
>>>dead-lock seriously degrades performance. However, I have heard of
>>>real cases where repeated dead-lock kills performance.
>>
>>If one loads any system beyond its capacity, it will exhibit
>>pathological behaviour.
>>The common term for this is "thrashing" where
>>concurrent processes spend more time on overhead than actual work. It
>>can happen in a lock manager. It can happen in a cache. It can happen in
>>a virtual memory manager.
>>
>>All real computer systems have finite capacity.

> 
> I don't find your generalisation useful.   If faced with a choice I
> would always pick a system that maintains a reasonable transaction
> throughput rather than one that falls in a heap (all other things being
> equal).

Since you will never be faced with that choice and will always have to accept a system that eventually falls in a heap one way or another, I don't know what else to say. You can either accept reality or face the consequences of your delusions.

>>>>[snip]
>>>>
>>>>
>>>>>In a shared read mode we get the ultimate in concurrency.
>>>>
>>>>Shared read/shared write is the ultimate in security. The use of the log
>>>>to provide multiple concurrent views of uncommitted data gets that job done.
>>>
>>>IMO the conservative locking I propose will lead to far superior
>>>performance. This is a systems programming question, and I can't
>>>back up the claim with quantitative results at present.
>>
>>Your opinion doesn't count for much, and I can confidently counter that
>>you will never back up the claim with quantitative results except for
>>perhaps a tiny special class of applications.

> 
> You're saying that in your opinion my opinion doesn't count for much.
> Does your opinion account for much?   Sorry I couldn't help myself.

I merely made a factual observation. You can accept that fact or not. Opinion has no relevance to the it.

>>>>[snip]
>>>>
>>>>>Subject to integrity constraints, mutative work can be fine grained.
>>>>>For example, it is not necessary to add a whole family at once to a DB;
>>>>>it is possible to add one person at a time.
>>>>
>>>>One of the great things about the relational model is set-level
>>>>operation. It is not necessary to add one person at a time when one can
>>>>add a whole family.
>>>
>>>What I'm saying is that if it's not necessary to add a whole family
>>>at a time (according to integrity constraints or atomicity
>>>requirements) then it would be silly to design the application that
>>>way.
>>
>>What I'm saying is that if it's possible to add the whole family at a
>>time (according to integrity constraints or atomicity requirements) then
>>it would be silly to design the application to prevent it.
>
> We agree on that, but I don't think it's relevant to our discussion.

Why then do you propose a silly design that prevents it?

>> Mutative changes should be applied in as small a transaction as
>>
>>>possible in order to promote concurrency and avoid dead-lock. That is
>>>commonly discussed in books on RDB.
>>
>>I agree. At the physical level, the dbms should not hold shared
>>resources any longer than absolutely necessary. I also suggest the dbms
>>should not hold more shared resources than absolutely necessary.

> 
> Note that with distributed transactions you may actually prefer to do a
> reasonable amount in a transaction because of the significant
> per-transaction overheads.

Your statement is either a tautology or nonsense. I cannot be bothered exerting the effort to tell which.

>>>>[snip]
>>>>
>>>>I suggest if you look at any text on concurrency and transactions in
>>>>dbmses, you will find your proposal has been well-considered and
>>>>long-ago rejected.
>>>
>>>Call me pig/big headed but I don't always believe what I read!
>>
>>Scientific publications have bibliographies for a reason. A new proposal
>>that simply ignores all prior work, such as your proposal does, gets
>>rejected without much further thought.
>
> I don't ignore prior work.

Au contraire.

[snip]

> Note that speculative research that is prepared to throw caution to the > wind can sometimes (albeit rarely) yield great results.

Once again, new hypotheses that simply ignore pasts observations get ignored. Until you offer any evidence that you have a clue about prior work, your hypothesis falls into this category. Old hypotheses that fail to predict new observations get discarded. You have not identified any such hypothesis.

Thus, you have offered nothing useful.

I have already given your proposal more time and attention than it merited. You now have a choice. You can learn enough of the prior work that you abandon your hypothesis or that you can offer a convincing argument why the prior work had everything all wrong. Or you can bounce off the bottom of the killfile with the other cranks who frequent this newsgroup. Received on Fri Dec 08 2006 - 17:23:26 CET

Original text of this message