Re: Transactions: good or bad?

From: Todd Bandrowsky <anakin_at_unitedsoftworks.com>
Date: 19 May 2003 19:52:34 -0700
Message-ID: <af3d9224.0305191852.443b80c_at_posting.google.com>


> This depends of course on how the implementation decides to serialise things.

If you want to have a logically consistent view of time, then, the implementation must implement some form of serialization.

Assume you have two concurrent overlapping statements:

user a: set profile foo = bar[ 5/1/2003 - 10/1/2003 ] + 72;

user b: set profile bar = foo[ 4/1/2003 to 6/1/2003 ] + 12;

The above statements mean: foo is being set to the extracted set of values of bar between 5/1/2003 and 10/1/2003, each plus 72, and bar is likewise being set to the extracted values from foo in an overlapping timeframe.

We claim that both statements are atomic. Therefor, user a must be isolated from user b, and, the database should have a state after execution of both a and b such that it look like a ran by itself, and then b ran, assuming a was presented first.

Let's assume that to implement this, the contents of the array bar are selected and copied into foo. To do this, you must read the contents of bar, and copy them into foo. Since there is more than one element, care must be taken to ensure that bar is not modified while it is being loaded into foo, and foo similarly must not be modified on another thread while the current thread is modifying it.

Because there is a bar and a foo, either being written from or read to, you multiple locks in a single unit of work, and, therefor, you have the potential for deadlocks. Continuing on, the example that I give deadlocks if the database engine locks on start of I/O. So, if it locks the left hand side and then the right, the database can deadlock because one thread's right is waiting for another thread's left, and vice versa.

There is a way to avoid this, but it's not what you are talking about.  foo = bar can be a functional declaration rather than an assignment. foo would be an accumulation of formulae based on bar, and vice versa, and I believe that those would run concurrently much more efficiently.  So, foo = bar would not be a copy operation, it would be an addition of a functional specifications to the list of such specifications in foo. When foo is ultimately selected from, (on the right hand side), elements in foo that are based on bar would be calculated at select time. Off the top of my head I cannot think of anything in the UPDATE statement that would require a lock on the right hand side, if all the UPDATE did was add a functional relationship to the target.

Such a scheme could certainly haul ass, but there are some serious theoretical problems that must be addressed. The one biggy that I see is that there will need to be some continous factoring of facts presented via the update to keep the search side down. That is, if I update Name = Todd and then Name = Bob, then, somehow I have to collapse the functional specification Name = Todd and Name = Bob into a single state. I do not know if every combination of expression is ultimately factorable. Are there prime predicate expressions? Intuitively I think this answer must be yes, there are. So these states are definately going to pile up.

Still, I think functional programming seems to offer the only real hope of avoiding many of the woes of serialization. I would say that the founders of SQL probably realized that real relational algrebra is essentially functional programming in nature, but, because they bailed on the predicate problem, they had to make do with simple assignment.

A relational database built on top of a functional programming approach might actually be the ticket we are all looking for. After I finish Commodity Server, I will probably implement this as an experiment.

> If you do true serialisation - only one update running at any one time,

True, but that is impractical for most applications.

> then
> unless you have user transactions in the equation you won't get deadlocks.

True, but a serialized database has less of a need for transactions.

> A
> more efficient serialisation would be to not only allow two 'overlapping'
> updates to run simultaneously.

See above. Isn't this the predicate based locking that we discussed earlier? Essentially, you want to lock on a where clause. This runs into some problems when multiple indeces are considered.

> The database is a single value. A non-scalar value consisting of relation
> values (that consist of tuple values that consist of scalar values).
>
> > atomic assignment of multiple
> > values is exactly what a transaction is
>
> Most people's idea of a transaction is of something that is not atomic over
> time.

Everything about transactions is ACID - Atomic, Consistent, Isolated and Durable. The point of the whole thing is that a transaction is a set of statements that look like they execute all at once.

>
> If your idea of a transaction is an atomic-in-logical-time change to multiple
> values within the non-scalar database value then your statement holds.

It is.

>
> Possibly, depending on the internal implementation code.

No, you must. It is serialization theory, AFAIK.

> It might seem silly, but it is infact a benefit because it is an example of
> independence. Your users are not dependent on internal deadlock detection
> schemes for their logical problem of deadlocks. The implementation can freely
> alter, tweak, tune or discard their deadlock detection schemes independently
> of any user visible scheme.

It can be useful to allow a checkin / checkout model of a selection, I do agree. However, deadlock detection is hard and not many people are good at it. How many Java programs are that hang for no reason because everyone randomly throws locks around without really doing a dependency analysis?

>
> Your argument is one that goes directly against the principle of data
> independence.

Nope. Received on Tue May 20 2003 - 04:52:34 CEST

Original text of this message