Re: Theoretical Basis for SELECT FOR UPDATE

From: Roy Hann <specially_at_processed.almost.meat>
Date: Tue, 4 Oct 2005 00:17:45 +0100
Message-ID: <quedna7Mq5mqIdzeRVnyug_at_pipex.net>


"vc" <boston103_at_hotmail.com> wrote in message news:1128365365.731587.247830_at_f14g2000cwb.googlegroups.com...
>
> Roy Hann wrote:
> > "vc" <boston103_at_hotmail.com> wrote in message
> > news:1128362658.869251.34550_at_g49g2000cwa.googlegroups.com...
> >
> > > Moreover, I bet that 'update,update; ' has to use the same mechanism
> > > behinds the scenes, namely deferrable constraints (what else ?), in
> > > order to be possible.
> > >
> > > All in all, the comma separated statements do not appear to bring much
> > > new to the table in comparison to the commit-delimited old style
> > > transactions except for perhaps more concise notation.
> >
> > On the contrary. It (has the capabability to) make the entire
transaction
> > evident and explicit. When I confront the usual unfamiliar SQL
application,
> > with tiny individually meaningless updates sprayed liberally around the
> > place, I have no confidence at all that any given commit that I might
add
> > leaves the database in a consistent state.
>
> Is not it rather a matter of style and programming discipline ? You
> can arguably as easily produce spaghetti code with comma-separated
> transactions as with the old style stuff. The difference, as I said,
> is just notation, and it's not very substantial.

There is a world of difference. For a start you can't create spaghetti code. You would surely be able to write ugly code, and foolish code, and wrong code. But you couldn't loop and branch and do tailspins and all the other nutty crap I spend my life trying to disentangle and simplify. (I'm mostly a performance tuner.)

Another big difference from standard SQL is that from the DBMS's point of view, all it sees is a stream of messages (SQL statements). It has no idea where they came from, and in fact they might come from diverse places and application routines called by routines called by lesser routines and so on. Working on any given application module you have no idea where it will be invoked, and what SQL will already have been sent. And even if you know today, you can't know what will happen in future. You can't know the context in which your SQL will be executed.

So if the commas are optional, you are right, that is no progress at all. But if they are mandatory--if every complex atomic statement (made possible by the commas) is also a transaction, meaning the database is moved to a new consistent state, then that could be huge progress.

> >Nor that any given rollback will
> > roll back all and only what I expect.
>
> Speaking of which, how does one rollback with new style transactions ?
> Or one is not supposed to unless an error occurs ?
>
> > Where does the transaction begin? Am
> > I entitled to expect to commit here? If a serialization failure occurs,
> > what--exactly--have I lost, and what do I have to try to redo?
>
> When any failure occurs, including serialization, the statement is
> rolled back automatically.

Well let me be a bit more precise. I am assuming ANSI/ISO SQL92 behaviour. In SQL92 a serialization failure does not merely roll back the last statement, it rolls back the entire transaction. Every error check needs to be capable of distinguishing serialization failure from a "normal" error so that the entire transaction can be re-tried. But where the heck does the transaction begin; what do I have to re-try? I might think I know which SQL statement initiated the transaction that just failed, but since transactions start implicitly I'm really just rolling the dice.

> > Notice that I only say comma separated updates have the capability to do
> > this.
>
> I can concede that the comma separated statement, in some cases
> (although I find it hard to imagine those cases), can prevent errors
> caused by sloppy coding practice.
>
> >As a practitioner I find SQL's eagerness to start a transaction
> > implicitly extremely irritating and a rich source of performance and
> > consistency problems.
>
> Excuse me ? If anything, the default transaction prevents one from
> introducing consistency problems by eliminating implicit commits (the
> SQL Server default mode).

I know nothing about SQL Server. Other products don't behave that way by default. It certainly doesn't sound like a good idea to me. If that is your point of reference then I can see how implictly starting a transaction would seem to be an improvement. But it's only less bad, not good.

The reason implicitly started transactions cause consistency problems is because you have no idea what you are committing nor what you are rolling back. You can't know (in general) where the transaction began. Ending a transaction is a shot in the dark. Into a crowded room.

> As to performance problems, could you
> elaborate on how the default transaction causes those ?

Most products prevent you from altering the isolation level or locking strategy in mid-transaction. If a transaction starts and you don't know it, you will try to (for example) relax the isolation level to improve concurrency and it won't work. And even if it does work today, there is no saying some bozo won't stick a new statement in after the last commit and the attempt to change the isolation level will start failing. It happens all the time. I would be astonished to find any real world application in existence where stuff like that doesn't happen in fact.

A bigger problem however is that by default you are entitled to read-consistency within a transaction. If the particular product you are working with enforces that through locking, and you start a transaction implicitly, you can easily do it accidentally. And if you don't realize you've started it, you won't know to commit/rollback when you no longer require read-consistency. That is the number one performance killer I see in highly concurrent OLTP apps. (I admit that the nature of my work ensures I only ever see apps with problems though.)

> > Even if a DBMS allowed complex updates to be
> > expressed as atomic operations, it would only solve my problem if it
> > *insisted* they be atomic.
>
> Please clarify.

If an atomic update has to be written as an atomic statement, then it would be possible to go further and insist that a statement *is* a transaction. Once you insist on that, then all the problems about where transactions begin and end go away.

Roy Received on Tue Oct 04 2005 - 01:17:45 CEST

Original text of this message