Re: "Transactions are bad, real bad" - discuss

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Wed, 30 Apr 2003 19:39:51 +0100
Message-ID: <b8p5dh$3356$1_at_gazette.almaden.ibm.com>


"Costin Cozianu" <c_cozianu_at_hotmail.com> wrote in message news:b8mqpa$bf0u1$1_at_ID-152540.news.dfncis.de...
> Paul Vernon wrote:
>> A single statement performs a single 'update' - a 'update' to the database
> > value. Strictly speaking we are replacing the current value of the
database
> > variable (dbvar) with a new value. If you wish to take the current dbvar
> > value, delete a few tuples, insert a few tuples and update a few tuples
all in
> > many relations, then assign the result to the dbvar to be the new dabase
> > value, you have still only performed one database operation.
> >
>
> That's a useful abstraction for you to claim that your model can be
> sustained,

Yes, thank you. I didn't mention dbvar's in my draft paper on transactions. It's a separate concept/abstraction, but I pulled it in here because I think it works well to prove that there are no (correct) updates possible with transactions that are not also possible without them.

> it's not a useful abstraction for me at all as the user of
> the database.

Abstraction usefulness is in the eye of the beholder. Teaching people about the database model, I happen to think it is clarifying to talk about the database as a single variable, with the traditional INSERT/UPDATE/DELETE ops just short hands for assignment operations. I think Date admits regret (in the Temporal book?) that TTM decided (as noted in a foot note) to refrain from explicitly using the concept.

> How you're going to transport or refer to the new database value in a
> client server protocol ?

I wish your question was clearer to me. Are you worried about data volumes if users starting pulling the whole database value to a local machine? I hope my answers latter help matters.

BTW Was it Larry Ellison that said, 'Client Server' was a big mistake? I doubt that I would often I agree with what he says, but this time I'm on board.

> It's not making sense at all.

Bare with it. You will get there ;-)

> What I do with INSERT , UPDATE, DELETE SELECT is that I apply a
> transformation to the old database value. I couldn't care less most of
> the times what was the old value and what is the new value, only some
> properties thereof.

Absolutely. Generally you just want to know a few properties, not the whole value. Not least because the whole database value is usually pretty big.

> As a consequence I'm not receiving database values nor am I sending
> database values,

Agreed (in general).
You could of course, send a database value if you wanted to set the database to a new value currently held only locally.

> I am applying operations.

Absolutely. Assignment operations (or shorthands for them). Remember that it is the DBMS that will do the assignment, you just transmit the statement.

> Some operations are a
> composition of many simpler operations.

OK, let me introduce the following to the argument.

  • Nestable INSERT/UPDATE/DELETEs **

On page455 of Intro to DB Systems (7th Ed), Date gives an simple motivating example in favour of transactions. I don't have it to hand but my answer is to make the INSERT/UPDATE/DELETE shorthands nestable. This would make the following valid syntax in Turoiral-D

P := UPDATE (

            DELETE P WHERE COLOR := COLOR('Red')
     )

    WHERE WEIGHT < WEIGHT( 15 )
  { CITY := 'Paris',
   COLOR := COLOR('Blue') } ;

I don't have time to further explain the details, but as you can see this is one statement built from other, simpler statements.

> This composition should obey the
> properties of transactions,

The composition should be atomic. Agreed. Nesting the operations in one statement make this atomicicity explicit without the need to resort to the extra concept of a 'transaction'.

> and should support some concurrency control
> features.

Such as?

> Otherwise, I won't be using your model at all, no thank you very much.
>
> > Remember INSET, DELETE and UPDATE are not primitives, only assignment is.
> >
>
> Quite. +,-,*,/ are not primitive only *succ* is succ(0)=1, succ(1)=2 ...
> , + n 0 = n, + n succ(k) = succ( + n k), ....
>
> Let's set aside programming languages for a while and let's joyfully
> program in Peanno's arithmetic, with a little ambition we can do it :)

Personally, I'd be more than comfortable working in a system that did not support INSERT/UPDATE/DELETE shorthands. OK, so UPDATEs are slightly painful to write out long hand, but INSERT, DELETE are trivial (almost shorter when written as an assignment).

Do you prefer

    INSERT INTO Parts VALUES (P34, 'A shiny new Hammer') or

    Parts := Parts UNION VALUES (P34, 'A shiny new Hammer') ?

    DELETE FROM Parts WHERE PartName = 'Dull trans action 8' or

    Parts := Parts MINUS (Parts WHERE PartName = 'Dull trans action 8') ?

Not a great deal in it if you ask me.

Having said that, I probably agree with Date & Darwen, I/U/D macros are usefull.

> >
> >
> >>Now, how do you design a client server protocol, in which the client
> >>programmer wants to decide later in the host language some based on the
> >>results from database and other possibly external conditions, what is
> >>the next database update to be executed (of course in a transaction-like
> >>mode if you claim you don't support tansaction).
> >
> >
> > I think it would be most productive if you could give an example of such a
> > scenario.
> >
>
> I'm sorry it is you who make claims here. If you don't have a systematic
> approach to address my needs, and claim you have a model that covers
> everything you're on the loosing side.

I tried last time to convince Jan that the burden of proof was on him not me, didn't work.
If we were back in 1971, and we had this nice, clean relational model and you came to me and said, but wait, we need this 'new fangled' concept called transactions here. I would have asked you why these would be needed in my clean model, asked for the single case where my clean model could not provide for something that your transactions could. I would not have had to think of every possible 'feature' of your new concept and ask, can my model do that 'feature' more cleanly, more elegantly.

(OK, so were are not in 1971...)

> Here's a deal: I reserve a sum to be debited against a client account
> (the sum is not debited yet, just the available balance reflects the
> reservation - i.e. the client has committed to pay). Based on some other
> happenings in the client software (let's say I cannot performa the
> service, or I can, etc), I want either to go ahead with the debit, or
> cancel the transaction (ROLLBACK). Compose several of these happening
>
> BEGIN
> db->DB_STATEMENT1
> client->ACTION1
> db->DB_STATEMENT2
> client->ACTION2
> db->STATEMENT3
> END
>
> Where each consecutive action depends on some results of the previous
> action, and at each point in time client should be able to decide a
> rollback.

It seductive isn't it, this rollback ability. A counter example.

db->DB_STATEMENT0 INSERT INTO 3G_Auction_Bid ('Network2', $11Billion') BEGIN TXN
db->DB_STATEMENT1 INSERT INTO 3G_Auction_Bid ('Network2', $12Billion')

client->ACTION1                Wait for 2 mins // end of the auction //
db->DB_STATEMENT2    SELECT winning_bid FROM 3G_Auction_Result
client->ACTION2               If winning_bid > $11Billion THEN
db->DB_STATEMENT3        COMMIT ELSE ROLLBACK END
END TXN Would you allow that in your DBMS? If not how to avoid it? Vet all applications using your DBMS! What an overhead.

P.S I've quite a few points to say in relation to your example above, and I know my counter example is contrived. But time presses.

> > My short answer is, if the 'client programmer' want to freeze time then
> > tough - that is not allowed in this model of reality.
> >
>
> Tough luck indeed. So we are back to using better models of reality.
> Like transactions.

One of my key points is that

Rollbacks are not a good model of the world. I can think of no example where I can perform some actions in the real world and then decide that I did not actually want to do them and so roll them back. I cannot do things and expect others to not be able to see them until I'm happy and commit them. Heck I wish I could, but we cannot freeze time, nor wind time back and undo what we have done. In reality we reverse things by taking compensating actions. I if withdraw too much money from my bank, I cannot 'roll back' my action, rather I need to deposit some of it back.

If I put my hand in the air, but withdraw it before a count is taken, my indecision was still visible to those around me. A database designed to mimic this situation could use transactions, but a better approach would be to record the state of my hand as it waivers up and down then count my vote at the finial decision time. If we wanted to hide such wavering from some authority, then that authority would only be given a database view that included the state of the hands at decision time. Just as a vote counter might turn their back until the time to count the votes arrives.

> >
> >
> >>How do you integrate databases and others in a distributed transaction ?
> >
> >
> > The job of integrating databases and of supporting distributed databases
is a
> > systems issue, its at the *physical level* of the model. Obviously I'm not
> > saying that the general concept of a transaction is not needed at such
levels,
> > I'm just saying that is real bad to allow such things to poke through to
the
> > logical, model level.
> >
> > If you want a distributed databases, then the DMBS will need to support
> > transactions under the covers, but the users don't need to see this, they
just
> > update their database value as they would with a single database.
> >
>
> Well, another reality check for you: not all participants in distributed
> systems are databases. Even if some are databases not all need to
> support relational algebra.

Every 'system' participating in a distributed relational database, needs to be able to masquerade as a database, needs to support the common relational algebra. This is trivially true.

Would my transactionless model make such masquerading more difficult for existing systems? Pass. Is it more difficult to add transactions to a non-transactional system, or to hide all evidence of transactions from a transactional system? Pass (although I suspect the latter).

If you have a 'distributed system' that is not trying to be a distributed relational database, then you can use whatever logical model you so desire, including say a logical concept of transactions.

> Therefore you cannot say
>
> BIG_VALUE_OFTHE_WHOLE WORLD := NEW_VALUE_OF_THE_WHOLE_WORLD
If I had a distributed relational database that supported my model, then yes, I darn well could do exaclty that. If I could not do that, it would not be a distributed relational database that supported my model.

> Imagine the scenario above with DB and client, compounded by several
> participating resource managers.

Merely implementation issues (assuming you would want to attempt such a task with existing systems. I'd just migrate any existing data and business rules to some new, easier to integrate, databases)

> >
> >
> >>How do you allow for rollback, possibly even in the middle of an update
> >>operation ?
> >
> >
> > At what level? As far as users are concerned, there is no 'middle of an
update
> > operation'. If the DBMS runs out of disk space mid update, then it's
business
> > as usual; the update gets backed out and an error message returned to the
> > users.
> >
>
> The middle of an update means exactly what I said the middle of an
> update.

?

> I perform an update, I get some results back, do a computation
> in the client, depending on my results, I'll issue some more updates, or
> I want to issue a rollback.

I count at least 2 updates there.

> >
> >
> >>Related to the above, what is your model for concurrency control ? Do
> >>you only allow for fully serializable executions ?
> >
> >
> > Well (at the logical level) obviously. At the physical level, obviously
not -
> > just as long as the illusion, i.e. the actuality as far as the users are
> > concerned, is full serializability. That the dbvar moves atomically from
one
> > value to the next in the same sequence that the CURRENT TIMESTAMP given
for
> > each user's query or update dictates.
> >
> >
> >>How is database going
> >>to decide that, if instead of a peacemeal statement by statement
> >>transaction, it will have to wait somehow for the full single statement
> >>behemoth ?
> >
> >
> > This is no different from today. Well it is different as many existing
DBMS
> > systems do not respect the rule that transaction timestamp order needs to
> > agrees with transaction serialization order. Banning transactions makes
> > keeping to this rule much easier to implement.
>
> Yes. You just shifted the burden to the client programmer.

Which burden? Although I probably agree with you. My model does, in places, require more work by users (and removes work from them in other places), but all the extra work is work to fully define the business rules of their database, rather than delegating some of the rules (such as choose of locking protocol) to the DBMS and it's often arbitrary and inflexible defaults and options.

> > Not ignoring this rule does
> > deprive us of a possible hack to improve performance. but because we don't
> > have transactions waiting on external code (or worse, external events) we
> > don't have such serious concurrency issues. Anyhow is it better to be fast
or
> > correct?
> >
> > To be fair, I've not investigated this aspect all that thoroughly, but I
only
> > see benefits here. For example, the DBMS knows the full extent of the
database
> > that any statement will touch. With a transaction, the full scope is
usually
> > unknown at the beginning of the transaction. Second, transactions
encourage
> > users to alter data in place and in a piecemeal fashion, banning
transactions
> > encourages users to build up data elsewhere then apply it in one go. This
> > frees the existing data from contention and helps concurrency. Third, for
> > updates that can be proved to be correct (e.g. divide by zero not
possible,
> > duplicate violation not possible, ...), long running update statements
(say
> > DELETE FROM BigTable WHERE Date + 1 Month < TODAY) can actually be
*included*
> > in subsequent queries/updates. So SELECT COUNT(*) FROM BigTable becomes
SELECT
> > COUNT(*) FROM BigTable WHERE Date + 1 Month < TODAY, while the big delete
is
> > still running...
> >
> > BTW If you could come up with an example of where you think a 'large'
single
> > statement will under perform, or provide worse concurrency than many
> > statements within a transaction, then that could be useful to our
discussions.
> >
>
> No, you made a claim, you have to provide at least as good mechanism as
> transactions do.

Agreed. I do provide a good as mechanism, well it's a lot better actually. In the case at hand, my mechanism is single atomic statements. To discuss the pros and cons of concurrency and performance, we need at least generic examples. In principle, as my model could (being generous) be a subset of a model with transactions, the performance of a txn model can then only be greater if users are encouraged use transactions even when everything they want can be done without them. To say that a txn model has inherently better concurrency or performance is incorrect given this view.

> You have to do that in a consistent and systematic
> fashion, it's not like: "give me this particular use-case, I'll program
> something ad-hock,and see, it can be done without transactions".

Agreed, although concrete examples can be very clarifying.

> I know
> it can be done without transactions, by reinventing some wheels, that is.

I would never knowingly reinvent a wheel. Only replace a cartwheel with a pneumatic tyre.

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Wed Apr 30 2003 - 20:39:51 CEST

Original text of this message