Re: Concurrency in an RDB

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 09 Dec 2006 03:26:02 GMT
Message-ID: <e1qeh.30338$cz.457253_at_ursa-nb00s0.nbnet.nb.ca>


Michael Ortega-Binderberger wrote:

>
> Hi, Sorry, but found this thread rather amusing, adding some comments in
> between below.
>
> On Fri, 8 Dec 2006, Bob Badour wrote:
>

>> David wrote:
>>
>>> Bob Badour wrote:
>>>
>>>> David wrote:
>>>>
>>>>> Bob Badour wrote:
>>>>>
>>>>>> David wrote:
>>>>>>
>>>>>>> I have some thoughts on how best to achieve concurrency in an 
>>>>>>> RDB.  You
>>>>>>> may want to keep in mind that I am a "systems programmer" with
>>>>>>> little experience in using an RDB.  I'm posting to this newsgroup to
>>>>>>> see whether my ideas have a reasonable basis.
>>>>>>>
>>>>>>> Consider that a single mutex is used to protect an entire RDB.  This
>>>>>>> mutex offers shared read / exclusive write access modes.   It avoids
>>>>>>> writer starvation by blocking further readers once one or more 
>>>>>>> writer
>>>>>>> threads are waiting on the mutex.
>>>>>>
>>>>>> Some write transactions take a long time to complete and will thus 
>>>>>> lock
>>>>>> everyone else out of the database.
>>>>>
>>>>> Can you outline for me a real life example where long lived mutative
>>>>> transactions are necessary?
>>>>
>>>> In some places, 200 microseconds is too long:
>>>> http://www.embeddedstar.com/press/content/2003/12/embedded11970.html
>>>>
>>>> Are you suggesting that it is possible to acquire and release a
>>>> universal exclusive lock over a distributed system in less than 200
>>>> microseconds?
>>>>
>>>> It takes 500 times as long as that to ping my local ISP.
>>>
>> [snip]
>>
>>> > In any case, restricting yourself to transactions that are local to 
>>> the
>>> process managing the DB, can you outline a realistic example where
>>> long-lived mutative transactions are necessary?
>>
>> Why should we restrict ourselves to a tiny special class of applications?

>
> Deos not matter, whatever the application writer wants to do. If long
> write-transactions are called for, thats ok. No need to restrict ourselves.

Just to make sure everyon is on the same page, we were not discussing application writers but dbms implementers. Designing a 'dbms' that strictly serializes all update transactions restricts all users of the dbms for all time.

>>>>>> [snip]
>>>>>>
>>>>>>> Exclusive write access to the entire RDB means there can never be
>>>>>>> dead-lock.  This eliminates a lot of complexity and overhead.
>>>>>>
>>>>>> It also eliminates almost all useful concurrency.
>>>>>
>>>>> Note that my premise is that concurrency is only needed for CPU
>>>>> intensive tasks and that these only require shared read access.
>>>>
>>>> That is one of your assumptions that is false.
>>>
>>> Can you show that?
>>
>> With all due respect, your assumption is not reasonable on its face. 
>> If you think it is reasonable, the onus lies on you to demonstrate its 
>> reasonableness.
>>
>> Concurrency is especially needed for I/O bound tasks, and in fact 
>> freeing a CPU for other tasks while waiting for an I/O operation to 
>> complete is one of the major benefits of concurrency.

>
> I comlpetely agree, the notion that one mutex RDB wide is good is
> simply, well, too simplistic. I agree that this approach has been
> rejected long ago.
> Concurrency is needed at the IO level, but also at the overall logical
> schema level. In your proposal, you lock the whole db exclusively for a
> write (which you assume incorrectly costs very little, writes cost-a
> lot). Say why is this needed? Why not better exclusively lock only the
> taxle which you will modify? ie have a mutex per table. Surely this
> increases concurrency even more per your definition. Now writers can
> proceed concurrently as long as they write to different tables. This is
> clearly superior to your original proposal per your measure of concurrency.
> It is also completely unworkable. It will easily lead to
> inconsistencies in the database. So lets come up with another approach,
> one mutex like before for every table and one "super mutex" for the
> whole RDB (using here mutex in a misleading way, systems programers tend
> not to understand differences between locks and latches).
> So lets suppose I lock the table in exclusive mode when I want to change
> it, but hot to keep things ok? lock the new "super mutex" in a new mode:
> I'm planning to change an underlying table. So you start to have more
> lock modes, and more concurrency.
> I won't spend the time here explaining the details how all this works.
> Just think about it a bit and see how this can improve concurrency and
> still keep things ok. Once you have thought about it, you will see its
> far more complicated than your original proposal, but also has more
> concurrency.
> Welcome to multigranularity locking. As another poster said, go read
> the literature before posting already rejected ideas.

I think the original poster may have also confused locks with transactions, but even trying to substitute one for the other to make the most possible sense doesn't make his position entirely sensible.

Transactions may acquire and release many locks over the duration of the transaction which may be orders of magnitude longer than the duration of any lock.

>>>>   If
>>>>
>>>>> that is the case then there is plenty of concurrency available during
>>>>> shared read modes.  Exclusive write access takes so little time 
>>>>> that it
>>>>> can be neglected.
>>>>
>>>>
>>>> That is a second of your assumptions that is false.
>>>
>>>
>>> See above
>>
>>
>> Ditto.
>>
>>
>>>>>>> In some database applications repeated dead-lock scenarios occur, 
>>>>>>> and
>>>>>>> the database can become very inefficient because transactions are
>>>>>>> continually aborted and rolled back.
>>>>>>
>>>>>>
>>>>>> Which applications are those? And why are dead-locks necessarily a
>>>>>> problem for those applications?
>>>>>
>>>>>
>>>>> I don't have the RDB experience to know how often and to what extent
>>>>> dead-lock seriously degrades performance.  However, I have heard of
>>>>> real cases where repeated dead-lock kills performance.
>>>>
>>>>
>>>> If one loads any system beyond its capacity, it will exhibit
>>>> pathological behaviour.
>>>> The common term for this is "thrashing" where
>>>> concurrent processes spend more time on overhead than actual work. It
>>>> can happen in a lock manager. It can happen in a cache. It can 
>>>> happen in
>>>> a virtual memory manager.
>>>>
>>>> All real computer systems have finite capacity.
>>>
>>>
>>> I don't find your generalisation useful.   If faced with a choice I
>>> would always pick a system that maintains a reasonable transaction
>>> throughput rather than one that falls in a heap (all other things being
>>> equal).
>>
>>
>> Since you will never be faced with that choice and will always have to 
>> accept a system that eventually falls in a heap one way or another, I 
>> don't know what else to say. You can either accept reality or face the 
>> consequences of your delusions.

>
>
> Agreed, all systems have limits, when pushed, the system will either
> reject servicing new requests, or degrade perforwance, too bad.
>
>>
>>>>>> [snip]
>>>>>>
>>>>>>
>>>>>>> In a shared read mode we get the ultimate in concurrency.
>>>>>>
>>>>>>
>>>>>> Shared read/shared write is the ultimate in security. The use of 
>>>>>> the log
>>>>>> to provide multiple concurrent views of uncommitted data gets that 
>>>>>> job done.
>>>>>
>>>>>
>>>>> IMO the conservative locking I propose will lead to far superior
>>>>> performance.   This is a systems programming question, and I can't
>>>>> back up the claim with quantitative results at present.
>>>>
>>>>
>>>> Your opinion doesn't count for much, and I can confidently counter that
>>>> you will never back up the claim with quantitative results except for
>>>> perhaps a tiny special class of applications.
>>>
>>>
>>> You're saying that in your opinion my opinion doesn't count for much.
>>> Does your opinion account for much?   Sorry I couldn't help myself.
>>
>>
>> I merely made a factual observation. You can accept that fact or not. 
>> Opinion has no relevance to the it.

>
>
> I agree, opinions count for little, you need to know the facts, and the
> facts are that since you cannot quantitatively prove that you single
> mutex is better than what all big db vendors have implemented, and you
> confess you're no db guru, then your opinion counts little.
>
>>
>>>>>> [snip]
>>>>>>
>>>>>>> Subject to integrity constraints, mutative work can be fine grained.
>>>>>>> For example, it is not necessary to add a whole family at once to 
>>>>>>> a DB;
>>>>>>> it is possible to add one person at a time.
>>>>>>
>>>>>>
>>>>>> One of the great things about the relational model is set-level
>>>>>> operation. It is not necessary to add one person at a time when 
>>>>>> one can
>>>>>> add a whole family.
>>>>>
>>>>>
>>>>> What I'm saying is that if it's not necessary to add a whole family
>>>>> at a time (according to integrity constraints or atomicity
>>>>> requirements) then it would be silly to design the application that
>>>>> way.
>>>>
>>>>
>>>> What I'm saying is that if it's possible to add the whole family at a
>>>> time (according to integrity constraints or atomicity requirements) 
>>>> then
>>>> it would be silly to design the application to prevent it.
>>>
>>>
>>> We agree on that, but I don't think it's relevant to our discussion.
>>
>>
>> Why then do you propose a silly design that prevents it?

>
>
> Again agreed. The relational model works best by doing set based
> operations. Actually, in a read db, the cost of adding say 5 family
> members in one set operation/transaction, compared to 5 different
> tranactions to add 1 at a time, is very different. Adding one at a time
> is waay slower, probably more than 4 times as slow as adding all 5 in
> one transaction. Why? look at how logging works and how the logs have
> to be hard flushed to disk before concluding the transaction. Don't know
> what this means ? Go read the bibliography.
>
>>
>>>>   Mutative changes should be applied in as small a transaction as
>>>>
>>>>> possible in order to promote concurrency and avoid dead-lock.  That is
>>>>> commonly discussed in books on RDB.
>>>>
>>>>
>>>> I agree. At the physical level, the dbms should not hold shared
>>>> resources any longer than absolutely necessary. I also suggest the dbms
>>>> should not hold more shared resources than absolutely necessary.
>>>
>>>
>>> Note that with distributed transactions you may actually prefer to do a
>>> reasonable amount in a transaction because of the significant
>>> per-transaction overheads.
>>
>>
>> Your statement is either a tautology or nonsense. I cannot be bothered 
>> exerting the effort to tell which.
>>
>>
>>>>>> [snip]
>>>>>>
>>>>>> I suggest if you look at any text on concurrency and transactions in
>>>>>> dbmses, you will find your proposal has been well-considered and
>>>>>> long-ago rejected.
>>>>>
>>>>>
>>>>> Call me pig/big headed but I don't always believe what I read!
>>>>
>>>>
>>>> Scientific publications have bibliographies for a reason. A new 
>>>> proposal
>>>> that simply ignores all prior work, such as your proposal does, gets
>>>> rejected without much further thought.
>>>
>>>
>>> I don't ignore prior work.
>>
>>
>> Au contraire.

>
>
> I agree, you come in as a know it all systems programmer thinking you
> know better. Sorry, you don't in this case (you might tell us a thnig or
> 2 about systems programming.)
>
> > [snip]
>
>>
>>> Note that speculative research that is prepared to throw caution to the
>>> wind can sometimes (albeit rarely) yield great results.
>>
>>
>> Once again, new hypotheses that simply ignore pasts observations get 
>> ignored. Until you offer any evidence that you have a clue about prior 
>> work, your hypothesis falls into this category. Old hypotheses that 
>> fail to predict new observations get discarded. You have not 
>> identified any such hypothesis.
>>
>> Thus, you have offered nothing useful.
>>
>> I have already given your proposal more time and attention than it 
>> merited. You now have a choice. You can learn enough of the prior work 
>> that you abandon your hypothesis or that you can offer a convincing 
>> argument why the prior work had everything all wrong. Or you can 
>> bounce off the bottom of the killfile with the other cranks who 
>> frequent this newsgroup.
>>

>
> I agree. I replied to this thread because it mas amusing, but don't
> indend to give online db classes to anybody.
>
> Have fun with your rereasrch.
>
> By the way, you mentioned multi branch-distributed - text editing
> oriented databases that sync their content so people can edit the same
> document in a distributed fashion:
> 2 answers:
> 1) look at clearcase (has replicated dbs, branching, auto merging, its
> text oriented (supperts binary too, just in case), has extesive
> branching...) Its also a pain to use, but don't ignore prior art.
> 2) look at google docs, their wrietely acquisition. I had thought about
> doing something lite that, they actually did it. I wish I would have
> been port of that team.

If by 'they' you mean google, then 'they' did not do it. 'They' bought it. If by 'they' you mean the development team at writely, then they did indeed do it. Received on Sat Dec 09 2006 - 04:26:02 CET

Original text of this message