Re: How to update multiple rows atomically

From: paul c <toledobythesea_at_oohay.ac>
Date: Sat, 29 Jul 2006 16:55:12 GMT
Message-ID: <QpMyg.262824$iF6.139224_at_pd7tw2no>


Tony Andrews wrote:
> Marshall wrote:

>> Here's what I came up with later:
>>
>> UPDATE T t1, T t2 set t1.owner = 1, t2.owner = 1 where
>>     t1.owner = 0 and t2.owner = 0 and t1.id = 2 and t2.id = 3;

>
> I find that solution unsatisfying as it is not general. What if there
> were 102 rows you wanted to update rather than just 2? Your first
> attempt looked OK, though apparently not for mySQL. Here is another:
>
> update T set owner=1
> where <condition>
> and not exists
> ( select null from T
> where <condition>
> and owner != 0);
>
> Now you don't even need to know in advance how many rows you want to
> update - e.g. <condition> may be "type = 'WIDGET'".
>

I'm not trying to rain on the SQL parade, nor the bigger parade that marches relentlessly towards greater and greater complexity, but in a way, I find it all unsatisfying. Surely the important context is the program, not the statement. Marshall's mention of Test and Set, which in its first IBM bare-metal incarnation had at least one notorious flaw, gets me going on this. From what little I know of SQL, one must set an 'isolation level', which determines the state of things outside, if you will, the statement boundary. Sometimes programmers don't even know what default isolation level the dba may have assigned to the db.

I would like to be able to read a program without knowing much about the environment nor about arcane, arbitrary concurrency components and know that it will perform my (stated) intentions. The various locking protocols around today mean that the same program behaves differently depending on its product environment. Just my two cents.

The recent so-called surrogate thread touched on what I think is a fairly intractable and fuzzy notion - duration (because durations can overlap), and muddling this up with whatever concurrent programs might be doing. (I know this is deviating from exactly what you guys are talking about.) At the risk of being ridiculed, I'd like to mention my earliest introduction to a way around having to deal with 'duration'. It was an old IBM batch language called RPG that had only six kinds of statements. The ones I remember best were called 'Input', 'Condition' and 'Output' (or somesuch), but I thought of them, sort of, as standing for 'Before' and 'After' and 'In-between'. Being a batch language that used files (also limited in other ways that are considered passe today), an RPG program's updates were completely serialized by the environment.

(RPG was usually thought of as a 'report writer' even though it could update files, but its outputs could be casually compared to what we now think of as relvars. It wasn't alone - there were similar products from other companies, I forget all their names. As a friend liked to remind me, no matter how much things change, Input and Output seem to stay with us.)

This still seems to me to be a clue towards the more natural way to deal with duration - state what is true at the start of an interval and what is true at the end of an interval, ie., a program marks the interval in a way and recognize that both must be true for the program to succeed. What happens during the interval ought to be independent of those assertions. This is what computers allow us to do. If they don't completely mimic reality, so be it. Even if they are digital, what they do is at best a partial analogue, aka an abstraction. If we want to assert things about other concurrent programs, then those assertions ought to be a stated part of our program.

Of course, RPG wasn't originally a db language and AFAICR it didn't recognize concurrency nor more modern lingo like 'transaction' - I'm not trying to say that a program must delineate one transaction, it could denote several. For nested transactions, which certainly have their uses, it doesn't necessarily follow that we need nested programs - I don't see why we couldn't have nested outputs, given the right notation.

(There is another risk in talking this way, which is to cause all kinds of 'noise', as David C puts it, because some will complain the above is nonsense due to the fact that SQL or some product or other or some theory doesn't work that way! So apologies to David in advance.)

p Received on Sat Jul 29 2006 - 18:55:12 CEST

Original text of this message