Re: Newbie question about db normalization theory: redundant keys OK?

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Wed, 26 Dec 2007 19:20:14 +0200
Message-ID: <Pine.SOL.4.62.0712261812150.24116_at_kruuna.helsinki.fi>


On 2007-12-25, David BL wrote:

> A middleware layer allows Y to pump messages from Qx and it uses Nx to
> ensure it processes each message exactly once.

Essentially what you're doing here is using a timestamping mechanism and the acknowledgements from Y to implement a rudimentary, distributed agreement protocol. It is used to assure that the message gets delivered once, and only once. If the transaction touched more than two databases, you would have to get agreement from all of them because failure to apply the update against Y would imply that the effects against Z have to be cancelled as well. That is, you haven't really gotten rid of the agreement protocol, but only gone to an optimistic version of it, and perhaps in the process even restricted it to a two database context only.

> On X, a local transaction does the withdrawal on Ax and pushes message
> onto Qx that represents a deposit that needs to be performed on Ay.
> [...] Now consider that Y needs to cancel the transfer. Eg this could
> occur because by the time Y processes the deposit message the relevant
> account has been closed. [...] In fact a cancel of a transfer is
> nothing other than a deposit message to undo the original withdrawal.

In the literature this sort of thing is called an open nested transaction, and the "cancel" is called a compensating transaction/compensation. The construct is used precisely for the reasons you cite, i.e. for less locking interference, higher concurrency and general decoupling, but it also exacts a clear price. Under it consistency cannot be guaranteed unless a) the only way to access the database is through b) operations with known semantics, c) possessing clearly defined undo/"compensation" operations, d) which commute suitably amongst each other, and e) which have real world semantics which allow us to conclude that this new, lower level of distributed consistency is appropriate. If general (usually called "native") transactions are allowed to proceed in parallel with the semantic primitives, f) usual locking semantics still need to be available and to be applied to control interactions between the native (lower level) and the semantic (higher level) actions.

In fact, that you included account cancellations in the set of permissible operations already break c) and d). Consider a situation where some money is transferred from account X to account Y, Y gets cancelled before the message is processed, and then X gets cancelled before the reverse deposit is. Now the money is lost and no compensating action is available. You could perhaps augment the system with zombie accounts, cascading aborts and real world compensating actions (i.e. trace the money back to where it came from, even across multiple dead accounts, and mail it to the original owner), but that is complicated and once you move outside of the banking field, in general you can't guarantee that all of the necessary options are available.

As an example, consider a claims processing database. We want each claim to always be allocated to a handler, and no more than say ten claims to be allocated to a single handler at a time. If all handlers are fully occupied, there might even be a legal obligation not to accept new claims. If X now assigns all of his claims to Y and accepts ten new claims for processing, no matter what we do from there on the real world legal mistake of accepting too many claims for processing in toto cannot be avoided, or rectified once it's occurred. Under global consistency in the normal sense, that couldn't have happened because assignment would only have been possible if Y were actually able to accept the extra load then and there, atomically.

What allows you to get even as far as you did with message queues with the account transaction example is the way money behaves in a real economy, that is, semantics. For example, you can rely on money being fungible and divisible, people not wanting throw to it away (unlike burdens such as a claim to be processed), other people always wanting to accept more of it, and for everybody to go to great pains (like keeping useless accounts around for a time) in order not to lose any of it. Obviously such constraints do not hold for all of the applications databases can and do have.

> Note that when we account for the outstanding (ie not yet processed by
> receiver) deposit messages, the total money in the system is conserved
> by the local transactions.

The problem is that this sort of accounting involves hidden state, which allows us to break global integrity constraints. The user visible state of the system is no longer consistent; my money temporarily disappeared. Of course this is what happens with current interbank transfers, and there's nothing wrong with it as long as the real world semantics make sense. But the usual notion of distributed consistency employed by database researchers is much more ambitious than that. It also tries to support all the rest of the applications which *don't* have anything to do with accounting or its particular characteristics.

> The durability requirement on these transactions can be relaxed. For
> example when they commit they can release locks before flushing the
> log, making the locks less sensitive to file I/O.

Even this is not really true: the new state of the queue still needs to be forced into stable storage before the transaction can safely return. Otherwise a crash before the message insertion is made persistent would cause an inconsistency between the states the client and the DBMS see.

> I wonder if they always can be (when one considers the "bigger
> picture" and one isn't constrained by an existing legacy system).

Hmm. I would not easily criticise the modern work on consistency and concurrency based on a single example -- accounting -- that in itself is several millennia of age.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Wed Dec 26 2007 - 18:20:14 CET

Original text of this message