Re: "Transactions are bad, real bad" - discuss

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Wed, 30 Apr 2003 19:17:05 +0100
Message-ID: <b8p42r$1ltu$1_at_gazette.almaden.ibm.com>


"Alfredo Novoa" <alfredo_at_ncs.es> wrote in message news:e4330f45.0304300221.855dfca_at_posting.google.com...
> "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
news:<b8ljgd$383i$3_at_gazette.almaden.ibm.com>...
>
> > Well that is exacly what I am arguing. We do not require transations in
the
> > (logical level of) the relational model. In fact they are positively
harmful.
>
> The only problem I see is if you are in a client server enviroment,
> how can you check if the partial updates are correct?
>
> Imagine the operator is typing a long invoice. He could like to know
> if each item line is correct before typing the next line.
>
> Of course even with transactions he can not be 100% sure, but It will
> work in lots of cases, and it is practical.
>

Good one Alfredo. Another sneaky thing that transactions are supposed to be good for. One of the original justifications for transactions I would imagine.

You know, it occurs to me that one reason transactions achieved such a mind share is that they do seem to offer a solution to many diverse problems. The weight of all these answers has caused people to accept that the concept itself must be a good one, must be fundamental somehow. However, I know of no proof that transactions are fundamental; that the relational model is incomplete without them. They are just taken as granted, and any problems that they cause are just accepted as the way things have to be. No wonder that the database community has been off in a ghetto all this time. How can other computer scientists take us seriously when we promote inessential and harmful concepts is such a manner?

Anyhow, lets see what is really going on in this case and see if transactions really are good for something (but no surprises for guessing the answer already).

I want to offer you some alternate solutions.

  1. Invoice lines are persistent

Each invoice line is stored in the database in a set of tables set aside for 'invoices in progress'. These tables have the relevant constraints on them and as each line is inserted (and committed) into these tables, the constraints are checked. Once an invoice is complete - defined by say a view over the 'invoices in progress' it can be moved into the 'main' database tables.

Frankly this is the best design. It avoids issues such as not defining the constraint

    'Every invoice must have at least one line' in the main database. This kind of omission happens all too frequently in current databases. It is omitted to allow an Invoice to be build 'incrementally', with initially only a invoice header and no invoice lines. The problem is that the constraints on an Invoice are actually more restrictive than the constraint on an Invoice-in-progress

One proposed (I would say hacked) answer to this constraint omission, is the suggestion that tuple and table constraints be enforced at each statement, but database constraints should be checked at transaction commit. Chris Date has written on dbdebunk.com on the fallacy of not checking all constraints at each statement (reduction of semantic optimisation opportunities was one argument), and will write further in his upcoming 8th edition of database systems. I suggest those in favour of SQL style constraint checking read his writings.

B) Invoice lines are not persistent

As above, but the tables set aside for 'invoices in progress' are local, in-memory tables on the client. Again, the constraints on these will be similar, but not identical to those on 'completed invoices'.

On the matter of redundancy between the constraints and table structures in the 'invoices in progress' tables and the 'completed invoices' tables. I say that any exploitation of this redundancy is a matter for the database design tool - it is a data modelling matter. Alternatively, or additionally, it could be seen as a higher-level constraint matter. E.g. no matter the specific schema for 'completed invoices', there will be a one-to-one mapping of attributes and tuple constraints to the schema for 'invoices in progress'.

C) Invoice lines are not persistent, but don't want to design an 'invoices in progress'

Lets say that the constraints on an 'invoice in progress' are identical to those on 'completed invoice'. In this case, if we don't want persistency, we might well want to use the DBMS constraints defined for 'completed invoice', for our 'invoices in progress'. With transactions you can 'simply' build your invoice incrementally in the DBMS, relying on it's statement level constraint checking. This is fine if you accept the downsides of transactions (highlighted elsewhere).
The problem we have is: how do we get to test the DBMS constraints against our 'invoice in progress' without actually doing an update? Now the constraints are available in the catalog, all we need do is pull them out and test them against our proposed update. The quick thinkers here will see that we have a higher-order problem here. The constraints are stored as values in the catalog, we need to 'execute' those values against the other values in the database.

For example, making a higher-order syntax up on the fly...

    EVALUATE (Catalog.Constraints.RelExpresion) OVER (D UNION 'invoice in progress rows')

which would return the Boolean value TRUE, if all constraints evaluate to TRUE when evaluated over the current value of the database D unioned with our proposed update.

Hoping that people have followed me thus far, the above is for me just yet another example of where a critical look at transactions shows that they are inessential, and further, that in fully replacing their functionality within the confines of the relational model, we highlights a number of important aspects of the model hitherto under exposed.

My list so far (newly extended with 'higher-order' queries (albeit ones limited to evaluating values from the catalog)) includes:

    multiple assignment / dbvar assignment     selection-assignment
    multiple relation selection
    catalog update intermixed with relation assignment     higher-order queries using catalog values     'constraint locking'
    triggered external actions
    plus others....

To conclude. Transactions are not essential and not good for this 'partial update' problem. In fact, I believe they are good for nothing (in the relational model).

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Wed Apr 30 2003 - 20:17:05 CEST

Original text of this message