Re: ACID et al

From: paul c <toledobythesea_at_oohay.ac>
Date: Tue, 06 Dec 2005 16:52:27 GMT
Message-ID: <fljlf.57085$Eq5.50047_at_pd7tw1no>


David Cressey wrote:
> "paul c" <toledobythesea_at_oohay.ac> wrote in message
> news:xZYkf.52828$Eq5.42138_at_pd7tw1no...
>

>>...

>
>
> PMFJI. I'm in a llitle over my head here. I understand how locking and
> ACID interact in one or two DBMS products, but not well enough to abstract
> what you are asking about here.
>
> The first thing I want to say is that there are basically two ways of going
> about ACID: synchronization and multiversioning.
>
> "Synchonization" basically means the same thing as locking. One thread
> blocks until another thread completes some action.
> In the rest of the thread, some people thought you meant, "using the lock
> manager" when you said "locking". I surmise that you only meant
> "synchronizing.

Thanks. I wouldn't argue with that interpretation although I wouldn't put it quite that way. My reason being that I don't want to imply that there is some kind of synchronization component. There isn't.

It also wasn't enough for me to say that the db is single-threaded, aka serial. I should have underlined that as far as the db engine is concerned a unit-of-work or transaction or consistent business query or update is bounded by a single logical message (I say 'logical message' because an environment surrounding the db might choose to physically queue fragments of a logical message). So for somebody who likes to talk in terms of atomic transactions, they could equally use the phrase 'atomic message'.

>
> It reminds me of when DEC Rdb was ported from VMS, which has a lock manager
> service in the OS, to unix which AFAIK, doesn't have such a service. They
> basically had to put all the synchronization primitives inside Rdb.

I used to ridicule the OS developers for their short-sightedness. (Haven't changed my mind - the only reason I stopped complaining is that it's a waste of time.) In the 1980's and early 1990's (I don't know about now) I followed fairly closely what the db engine developers for various products were doing. One thing they were doing was spending a lot of time extending existing OS facilities such as locking to the db environment. Another thing was writing low-level disk drivers. The list was a long one. The OS manuals would describe api's to use various OS services but they were always insufficient for the db's. I found this so ironic since when it came to concurrency, both camps were working really on the same problem. And both camps were of the 'not invented here' persuasion. For me, the problem goes deeper, right down to cpu architecture. Trade press thought it was great when stack-oriented processors became common. Even IBM acceded with their mainframes. But I heard nobody ask the HW people, where's the double-ended stack that makes language interpretation so much easier. So all the compiler/interpreter people had to write their own in software.

When your average politician here (Canada) is so often saying we must have more technology/education funding to increase the national productivity, they haven't a clue how unproductive the IT/high-tech infrastructure already is. Just my rant of the day.

>
> "Multiversioning" basically means keeping every version of every updated
> cell around, until it's certain that no remaining process will need to read
> that version. Whenever anything is read, if there's more than one version,
> the multiversioner figures out which one is "right" for that thread.
> There's more to it than that, because you still have to make sure that
> transactions are atomic.
>
> A few other comments:
>
> 1) ... in-memory.
>
> As long as there is a single "server" that keeps part of the database in
> memory,
> and combines the multiple threads in a single process, the distinction
> between an in memory database and a database that's on disk with a cache in
> memory is trivial as far as ACID is concerned. Oracle RDBMS, in the
> simplest configurations, works like this.
>
> If each client process has its own server process, and they manipulate
> multithreading by interprocess communication, it gets lots more
> complicated. DEC Rdb (now Oracle Rdb) works like this.
>
> point 2. REdo log is forced and syched at commit time.
>
> A good point. This is enough to satisfy the D in ACID, as long as you can
> make the contents of the redo log durable, and as long as you can process
> the redo log serially.
>
> point 3. Single threaded. This is where I get lost.
>
> It sounds like you are saying that "transactions are serialized", and "that
> transactions are inherently atomic, because they take place in response to
> a single call by the client". If I've got that right, I'm going to call
> your attenttion to a long and involved recent thread in which some people
> compared SQL, which requires a "START TRANSACTION" (perhaps implicit),
> and a "COMMIT" or "ROLLBACK" to a more developed language, in which an
> entire atomic transaction can always be expressed in the form of a single
> request.

I'll look for that. However, one question I like to ask myself is "what theory and practice would have developed instead of what we have today, had desktop computers with the memory, cpu speed and ethernet bandwidth that we have today were common 35 or 40 years ago?". Around 1970, larger IBM mainframes were lucky if they have several hundred K of memory. Disk capacities ranged from 7-28MB or so with access (seek) times around 100 milliseconds. The typical IBM or DEC terminal (eg. text-only) had NO programmability other crude character attributes for flashing, underlining, blanking or 'read-only'. Just taking IBM as an example, a lot of the theory that went into System/R was influenced by the prior IMS physical performance decisions. I remember seeing some of the same heritage in the Dec, Prime and other 'minicomputers' I had some exposure to in the 1980's. Almost at the same time, it was finding its way on the desktop microprocessor in products like dbase. Codd was one of the first people to point out in a really noticeable if indirect way that theory was mostly being driven by physical practice. We were all victims of this. Much of the current theory whether multiversioning or caching et cetera was influenced by the inefficiencies caused by disk latencies, puny memory by today's standards and so forth.

I remember, it's years ago now, spending quite a lot of time trying to figure out what *ought* to happen when a triggered procedure (equivalent, I think for this purpose, to a nested transaction) issues a Rollback or how to undo a nested Commit if a 'parent' transaction finds out it can't complete. For the life of me, if I had had the chance to wipe the slate clean and start with the presumption that a request to the db is really a program and that program can theoretically be as complex as we need and that program has the ability to see both the before and after effects before it decides whether to continue or fail, I often thought that all those verbs shouldn't really be needed. When I look at the few real examples Gray and Reuter give in their massive book about transactions I just cannot see a need today for much of the complexity they espouse. I'm not saying that all of the theory is worthless, for example, I think the ACID idea is a pretty good litmus test when we try to design or assess support for transactions. But I think the most common problems can be solved with some pretty simple syntax (maybe R' would stand for the value R will contain after a transaction completes) along with engine support for what I think of as the 'intent' or 'delta' (at a table or 'relvar' level). Folding R's delta into R happens immediately before the redo 'synch'. If the synch fails, the system fails.

I haven't explained what I mean by 'db program'. One important meaning is that a db 'program' is either all TRUE or NOT (succeeds or fails). Nothing persistent changes unless it is all TRUE. One implication of this and the 'delta' notation above is that a program has no sequence - any of its individual statements can occur in any order. This is not a new idea - I first saw it in a partial way in RPG many years ago and I'm sure other systems had similar. I think it's not hard to limit for the   in-memory db I have in mind.

cheers,
p Received on Tue Dec 06 2005 - 17:52:27 CET

Original text of this message