Re: "Transactions are bad, real bad" - discuss

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Tue, 29 Apr 2003 20:47:28 +0100
Message-ID: <b8ml4q$1da8$1_at_gazette.almaden.ibm.com>


"Costin Cozianu" <c_cozianu_at_hotmail.com> wrote in message news:b8mecc$bnbun$1_at_ID-152540.news.dfncis.de...
> Ok, let's say you want only single statement updates,

Agreed.

> with single
> statement supporting multiple database operations.

No. A single statement performs a single 'update' - a 'update' to the database value. Strictly speaking we are replacing the current value of the database variable (dbvar) with a new value. If you wish to take the current dbvar value, delete a few tuples, insert a few tuples and update a few tuples all in many relations, then assign the result to the dbvar to be the new dabase value, you have still only performed one database operation.

Remember INSET, DELETE and UPDATE are not primitives, only assignment is.

> Now, how do you design a client server protocol, in which the client
> programmer wants to decide later in the host language some based on the
> results from database and other possibly external conditions, what is
> the next database update to be executed (of course in a transaction-like
> mode if you claim you don't support tansaction).

I think it would be most productive if you could give an example of such a scenario.

My short answer is, if the 'client programmer' want to freeze time then tough - that is not allowed in this model of reality.

> How do you integrate databases and others in a distributed transaction ?

The job of integrating databases and of supporting distributed databases is a systems issue, its at the *physical level* of the model. Obviously I'm not saying that the general concept of a transaction is not needed at such levels, I'm just saying that is real bad to allow such things to poke through to the logical, model level.

If you want a distributed databases, then the DMBS will need to support transactions under the covers, but the users don't need to see this, they just update their database value as they would with a single database.

> How do you allow for rollback, possibly even in the middle of an update
> operation ?

At what level? As far as users are concerned, there is no 'middle of an update operation'. If the DBMS runs out of disk space mid update, then it's business as usual; the update gets backed out and an error message returned to the users.

> Related to the above, what is your model for concurrency control ? Do
> you only allow for fully serializable executions ?

Well (at the logical level) obviously. At the physical level, obviously not - just as long as the illusion, i.e. the actuality as far as the users are concerned, is full serializability. That the dbvar moves atomically from one value to the next in the same sequence that the CURRENT TIMESTAMP given for each user's query or update dictates.

> How is database going
> to decide that, if instead of a peacemeal statement by statement
> transaction, it will have to wait somehow for the full single statement
> behemoth ?

This is no different from today. Well it is different as many existing DBMS systems do not respect the rule that transaction timestamp order needs to agrees with transaction serialization order. Banning transactions makes keeping to this rule much easier to implement. Not ignoring this rule does deprive us of a possible hack to improve performance. but because we don't have transactions waiting on external code (or worse, external events) we don't have such serious concurrency issues. Anyhow is it better to be fast or correct?

To be fair, I've not investigated this aspect all that thoroughly, but I only see benefits here. For example, the DBMS knows the full extent of the database that any statement will touch. With a transaction, the full scope is usually unknown at the beginning of the transaction. Second, transactions encourage users to alter data in place and in a piecemeal fashion, banning transactions encourages users to build up data elsewhere then apply it in one go. This frees the existing data from contention and helps concurrency. Third, for updates that can be proved to be correct (e.g. divide by zero not possible, duplicate violation not possible, ...), long running update statements (say DELETE FROM BigTable WHERE Date + 1 Month < TODAY) can actually be *included* in subsequent queries/updates. So SELECT COUNT(*) FROM BigTable becomes SELECT COUNT(*) FROM BigTable WHERE Date + 1 Month < TODAY, while the big delete is still running...

BTW If you could come up with an example of where you think a 'large' single statement will under perform, or provide worse concurrency than many statements within a transaction, then that could be useful to our discussions.

> This is just to begin with, depending on your responses we'll see what
> follows...

Rubbing my hands ;-)

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Tue Apr 29 2003 - 21:47:28 CEST

Original text of this message