Re: Theoretical Basis for SELECT FOR UPDATE

From: Roy Hann <specially_at_processed.almost.meat>
Date: Tue, 4 Oct 2005 09:51:26 +0100
Message-ID: <AdudnY2lqs8y39_eRVnyvg_at_pipex.net>

"vc" <boston103_at_hotmail.com> wrote in message news:1128389900.766332.93980_at_g43g2000cwa.googlegroups.com...

A number of your remarks seem ill-informed. I understand that this is a theory group, but theory has to be informed about the real world. Theory is only a systematic way of thinking about what we do (or could do) in practice. It is not an excuse to ignore inconvenient truths. I won't criticize you if you don't have any real-world experience; nobody is born with it. But I would like to know if you have ever worked on any real-world, large-scale applications, maybe involving hundreds of programmers over a period of 10 or more years? I think it is particularly relevant when discussing databases because they can be enormously long-lived.

> > There is a world of difference. For a start you can't create spaghetti
> > code. You would surely be able to write ugly code, and foolish code,
and
> > wrong code. But you couldn't loop and branch and do tailspins and all
the
> > other nutty crap I spend my life trying to disentangle and simplify.
(I'm
> > mostly a performance tuner.)
>
> OK, apparently you are talking about coding style issues. I am still
> not convinced that there is much difference between s1,s2; and
> s1;s2;commit, but so be it, let's skip it.

No, let's not. The cost of developing and maintaining spaghetti code is a significant fraction of the maintenance cost of real systems. I sense that you are seeing mere syntactical sugar here. These complex atomic statements are more like green leafy vegetables. They are good for you, in ways I have already hinted at. They are not just cosmetic.

They are also important for another compelling reason I didn't mention before. Unless you have a way to tell the DBMS the entire series of operations you intend to execute, it is forced to do only peephole optimization on the (possibly trivial) individual steps rather than the whole problem. If you could tell the DBMS everything you intend to do, it could (in principle) do a global optimization. Having spent more than 15 years trying to breath life into brain-dead row-at-a-time cursor driven applications in which the programmer actually hard-codes the execution strategy, I can assure that not being able to do that is a huge problem. I routinely see programs that once ran for days reduced to run in hours and sometimes minutes once the optimizer is allowed to see the whole problem.

> > > When any failure occurs, including serialization, the statement is
> > > rolled back automatically.
> >
> > Well let me be a bit more precise. I am assuming ANSI/ISO SQL92
behaviour.
> > In SQL92 a serialization failure does not merely roll back the last
> > statement, it rolls back the entire transaction. Every error check
needs to
> > be capable of distinguishing serialization failure from a "normal" error
so
> > that the entire transaction can be re-tried.
>
> You may have a point about the serialization failure, but popular
> databases like SQL Server/Sybase or Oracle either do not implement
> serialization failure, or rollback only the statement that caused the
> error.

??Huh?!

> (SQL Server simply dead-locks when a serialization failure
> happens so probably you can say it does implement it). The user can of
> course detect the error and rollback the entire transaction.

This is wildly wrong. Deadlock *is* a serialization failure. There is no valid response to it other than a rollback. The system doesn't give the programmer (or "user" if you will) a choice about rolling back. It just happens, and you have to know it happened and know what to do about it. And I say I *can't* know what to do about it.

> >But where the heck does the
> > transaction begin; what do I have to re-try? I might think I know which
SQL
> > statement initiated the transaction that just failed, but since
transactions
> > start implicitly I'm really just rolling the dice.
>
> That's a strange question indeed. You designed and wrote the
> application so you are supposed to know how the application was
> structured into transactions, no ? It's like asking, how can I figure
> out this tree traversal algorithm if all I see is a bunch of IFs and
> WHILEs and some assignments.

"You" here is not me. It is a large team of more-or-possibly-less well-trained programmers, working over a period of 10 or 20 years. They have been roaming about the code plying their trade with varying degrees of skill and intelligence for a long time. No one person any longer has any idea what is going on where. All because transactions start implicitly and are described only in scattered fragments. Whether you are in a transaction at any given point in the code depends on the path you traversed to get there. It can vary from iteration to iteration and end-user to end-user. It's an irredeemable mess because the SQL syntax allows (indeed encourages) it to be a mess. Complex atomic statements would make that problem go away.

> Also, you did not answer how one is supposed to rollback explicitly
> in the comma delimited sequence.

I can't think why I would ever want to. I want to roll back only if there is a constraint violation or there is a serialization failure, and that is all. In both cases the rollback would be implicit, as it should be.

> [...]
> > > Excuse me ? If anything, the default transaction prevents one from
> > > introducing consistency problems by eliminating implicit commits (the
> > > SQL Server default mode).
> >
> > I know nothing about SQL Server. Other products don't behave that way
by
> > default. It certainly doesn't sound like a good idea to me. If that is
> > your point of reference then I can see how implicitly starting a
transaction
> > would seem to be an improvement. But it's only less bad, not good.
> >
> > The reason implicitly started transactions cause consistency problems is
> > because you have no idea what you are committing nor what you are
rolling
> > back.
>
> Well, it seems as if the applications you are dealing with are just a
> random collection of SQL statements without any structure.

Yes! Now you see the problem.

> It's very
> easy to know where the transaction begins (with the first executable
> statement) and where it ends (with a commit or a rollback).

You wouldn't find it "very easy" to know where the transactions begin in one my clients' systems. It is the national integrated taxation system of a small but prosperous European country. It has been under development for almost 10 years. The number of programmers involved is well over a thousand by now. The number of lines of code (COBOL) is well into the multiple millions. The number of copybooks and procedures and externals of all kinds is into the many hundreds.

Of course if you still think it would be easy, we could discuss the terms of a wager to settle the matter! :-)

> That's all
> there is to it, really simple.

You are making me smile.

> If you wish to indicate where in a
> piece of code the transaction begins, you can use a) comments b)db
> proprietary delimiters, 'begin transaction ' in SQL Server or 'set
> transaction' in Oracle.

Your "a)" is naive and silly. Your "b)" is more realistic, but as you point out, it is proprietary. And ultimately it doesn't solve the problem because while you might see the "begin transaction" statement somewhere, you can't know (in general) what happens between that point and your real focus of attention somewhere in some downstream section of code. Nor does it provide a way to describe the full intention to the DBMS so it can generate a globally optimized execution plan.

> Besides, how different is the new syntax
> where the transaction also begins with the first executable statement:
> update,delete,delete,insert..; ? What's the difference from
> update;delete;delete,insert..;commit; ?
>
> >You can't know (in general) where the transaction began. Ending a
> > transaction is a shot in the dark. Into a crowded room.
>
> Does not make sense. Please see above.

I can only suppose it doesn't make sense to you because you've never looked at a real large-scale system. That's no criticism, but I'd like to know.

> > > As to performance problems, could you
> > > elaborate on how the default transaction causes those ?
> >
> > Most products prevent you from altering the isolation level or locking
> > strategy in mid-transaction.
>
> And for a good reason: to prevent chaos.

Agreed.

> Why would you want to change
> the locking stategy or isolation level in the middle of a transaction ?

That is exactly what I don't want to do. I want to change the isolation level between transactions and *know* that it will work (which requires me to *know* I am not in a transaction at that point).

> [snip]
>
> >If a transaction starts and you don't know it,
> > you will try to (for example) relax the isolation level to improve
> > concurrency and it won't work.
>
> How can you not know when a transaction starts, as the application
> author ?

See above.

> 'Relaxing' the i.l. also requires very careful design and
> understanding the concurrency mechanism, not just pushing buttons
> randomly.

Please. You are in danger of teaching your granny to suck eggs.

> [snip]
>
> > A bigger problem however is that by default you are entitled to
> > read-consistency within a transaction.
>
> Apparently, you've meant the READ COMMITTED i.l., the default i.l.

I am entitled to read-consistency with an isolation level of REPEATABLE READ and SERIALIZABLE.

> >If the particular product you are
> > working with enforces that through locking, and you start a transaction
> > implicitly, you can easily do it accidentally. And if you don't realize
> > you've started it,
>
> If someone 'does not realize he has started it', he'd better read a
> manual or something, it's, like ABC of transaction management.

That is just silly. That is exactly like have a building code that says it's OK to have bare wires hanging out the light switch as long as you put up a sign that says "be careful not to touch the wires". I'd prefer a building code that requires the wires to be installed so they can't be touched.

> You commit when it makes sense to commit from the transactional point
> of view, in order to ensure data consistency, not to improve
> performance.

Oh I completely agree with that. But many products use locking to create consistency, and locking takes data off-line for the use of the locker(s). You need to commit/rollback to put the data back on-line. That's bad I agree, but it's how real products force you to work.

> Admittedly, people do that (release locks by committing
> prematurely), especially with locking databases, but it does not make
> it right, does it ?

I enthusiastically agree.

> [...]
> > > > Even if a DBMS allowed complex updates to be
> > > > expressed as atomic operations, it would only solve my problem if it
> > > > *insisted* they be atomic.
> > >
> > > Please clarify.
> >
> > If an atomic update has to be written as an atomic statement, then it
would
> > be possible to go further and insist that a statement *is* a
transaction.
>
> But in every database on the earth, a statement (update/delete/insert)
> *is* already atomic and can be treated as a transaction (when run at an
> appropriate i.l.). Am I missing something ?

Evidently. No real-world business transaction can faithfully be represented by updating one row in one table at a time. In general an SQL transaction (i.e. a series of incremental changes that move the database to a new consistent state) will consist of dozens or even many hundreds of update/delete/inserts.

> > Once you insist on that, then all the problems about where transactions
> > begin and end go away.
>
> Well, I do not think there is any problem with that now. A "for
> example" would be nice.

I couldn't possibly prove my assertion with a mere example, so I am willing to be told I have to retract it. I am not a theoretician. I see myself only as a practitioner who would like better tools. I don't like what I have right now, and no one can tell me what I have right now is adequate because it just isn't. What I have right now is very badly broken.

Roy Received on Tue Oct 04 2005 - 10:51:26 CEST

This message: [ Message body ]
Next message: peter.koch.larsen_at_gmail.com: "Re: Theoretical Basis for SELECT FOR UPDATE"
Previous message: Bob Stearns: "Re: Nested sort, trying again"
Maybe in reply to: Tony Andrews: "Re: Theoretical Basis for SELECT FOR UPDATE"
In reply to Roy Hann: "Re: Theoretical Basis for SELECT FOR UPDATE"
Next in thread: vc: "Re: Theoretical Basis for SELECT FOR UPDATE"
Reply: vc: "Re: Theoretical Basis for SELECT FOR UPDATE"
Reply: Roy Hann: "Re: Theoretical Basis for SELECT FOR UPDATE"
Reply: David Cressey: "Re: Theoretical Basis for SELECT FOR UPDATE"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message