Re: Newbie question about db normalization theory: redundant keys OK?

From: David BL <davidbl_at_iinet.net.au>
Date: Wed, 26 Dec 2007 17:57:41 -0800 (PST)
Message-ID: <05180690-805c-4d19-ac91-5b565e1dfb42_at_a35g2000prf.googlegroups.com>


On Dec 27, 2:20 am, Sampo Syreeni <de..._at_iki.fi> wrote:
> On 2007-12-25, David BL wrote:
>
> > A middleware layer allows Y to pump messages from Qx and it uses Nx to
> > ensure it processes each message exactly once.
>
> Essentially what you're doing here is using a timestamping mechanism and
> the acknowledgements from Y to implement a rudimentary, distributed
> agreement protocol. It is used to assure that the message gets delivered
> once, and only once. If the transaction touched more than two databases,
> you would have to get agreement from all of them because failure to
> apply the update against Y would imply that the effects against Z have
> to be cancelled as well. That is, you haven't really gotten rid of the
> agreement protocol, but only gone to an optimistic version of it, and
> perhaps in the process even restricted it to a two database context
> only.

But can you provide a realistic example, and offer a "proof" that message queues are inadequate? Does such a demonstration exist in the literature?

> > On X, a local transaction does the withdrawal on Ax and pushes message
> > onto Qx that represents a deposit that needs to be performed on Ay.
> > [...] Now consider that Y needs to cancel the transfer. Eg this could
> > occur because by the time Y processes the deposit message the relevant
> > account has been closed. [...] In fact a cancel of a transfer is
> > nothing other than a deposit message to undo the original withdrawal.
>
> In the literature this sort of thing is called an open nested
> transaction, and the "cancel" is called a compensating
> transaction/compensation. The construct is used precisely for the
> reasons you cite, i.e. for less locking interference, higher concurrency
> and general decoupling, but it also exacts a clear price. Under it
> consistency cannot be guaranteed unless a) the only way to access the
> database is through b) operations with known semantics, c) possessing
> clearly defined undo/"compensation" operations, d) which commute
> suitably amongst each other, and e) which have real world semantics
> which allow us to conclude that this new, lower level of distributed
> consistency is appropriate. If general (usually called "native")
> transactions are allowed to proceed in parallel with the semantic
> primitives, f) usual locking semantics still need to be available and to
> be applied to control interactions between the native (lower level) and
> the semantic (higher level) actions.
>
> In fact, that you included account cancellations in the set of
> permissible operations already break c) and d). Consider a situation
> where some money is transferred from account X to account Y, Y gets
> cancelled before the message is processed, and then X gets cancelled
> before the reverse deposit is. Now the money is lost and no compensating
> action is available.

When you say "Y is cancelled" do you mean an account on machine Y is being closed?

Closing an account doesn't happen synchronously. Consider that an account on Y is being closed.

  1. A local transaction on Y marks the account as closed and "close account" messages are posted to its peers (by pushing messages onto the local queues)
  2. Given that peers only find out asynchronously that the account is being closed there will be some period of time for which deposits continue to be made into the account. Y must continue pumping messages from its peers. Using a local transaction on Y, each deposit into the account marked as closed must be cancelled by pushing a compensating deposit onto the queue for the relevant peer.
  3. It is assumed that each peer (in a local transaction) eventually reads the message that the account is being closed. This stops it from performing any further deposits into the account. In the same local transaction the peer posts an acknowledge message onto its local queue that is pumped by Y.
  4. Eventually Y reads all the acknowledge messages from all its peers and determines that the account is now properly closed, because by ordered delivery of messages in the queues, it is safe to assume there will be no outstanding deposit messages for that account in the system.

> You could perhaps augment the system with zombie
> accounts, cascading aborts and real world compensating actions (i.e.
> trace the money back to where it came from, even across multiple dead
> accounts, and mail it to the original owner), but that is complicated
> and once you move outside of the banking field, in general you can't
> guarantee that all of the necessary options are available.

I think the above steps allow any number of accounts to be closed, and no money will be lost.

> As an example, consider a claims processing database. We want each claim
> to always be allocated to a handler, and no more than say ten claims to
> be allocated to a single handler at a time. If all handlers are fully
> occupied, there might even be a legal obligation not to accept new
> claims. If X now assigns all of his claims to Y and accepts ten new
> claims for processing, no matter what we do from there on the real world
> legal mistake of accepting too many claims for processing in toto cannot
> be avoided, or rectified once it's occurred.

I don't agree. There should be a concept of a claim that is not currently allocated to a handler, and the claim is passed around the network until it is assigned to a handler. Allocation of a claim to a handler is a local decision using a local transaction allowing the legal obligation to be enforced.

> Under global consistency in
> the normal sense, that couldn't have happened because assignment would
> only have been possible if Y were actually able to accept the extra load
> then and there, atomically.

A 100% guarantee of global consistency is provably not possible and multiphase commit protocols can get it wrong under certain (perhaps unlikely) failure conditions. It is far from clear to me that message queues cannot provide similar statistical guarantees.

> What allows you to get even as far as you did with message queues with
> the account transaction example is the way money behaves in a real
> economy, that is, semantics. For example, you can rely on money being
> fungible and divisible, people not wanting throw to it away (unlike
> burdens such as a claim to be processed), other people always wanting to
> accept more of it, and for everybody to go to great pains (like keeping
> useless accounts around for a time) in order not to lose any of it.
> Obviously such constraints do not hold for all of the applications
> databases can and do have.

It's unfortunate if the argument in favour of multiphase commit protocols is that the real world is messy and complicated. I would like to see a simple example that shows they are necessary.

I tend to think of 2PC (never mind 3PC) as messy and complicated compared to message queues. The problem is not so much with the protocol when it succeeds, but rather with how it behaves when there are failures. How many programmers properly understand how 3PC avoids most but not all blocking that can occur in 2PC? Under what conditions will the protocol fail to give consistency? The recovery mechanisms for these protocols are horribly complicated.

Do you agree that by Occam's razor we should favour message queues, unless or until we have demonstrated that 2PC/3PC is necessary?

> > Note that when we account for the outstanding (ie not yet processed by
> > receiver) deposit messages, the total money in the system is conserved
> > by the local transactions.
>
> The problem is that this sort of accounting involves hidden state, which
> allows us to break global integrity constraints. The user visible state
> of the system is no longer consistent; my money temporarily disappeared.
> Of course this is what happens with current interbank transfers, and
> there's nothing wrong with it as long as the real world semantics make
> sense. But the usual notion of distributed consistency employed by
> database researchers is much more ambitious than that.

Perhaps too ambitious :)

There are a number of papers about the impossibility of properly defining global properties of a distributed system. How do you define global integrity constraints? In what sense could you hope to get a "consistent cut"? I feel like you want something that doesn't and can't exist.

Can you provide an example where message queues create a visible inconsistency for an end user whereas 2PC would avoid it?

Paradoxically, the asynchronous approach is so much more efficient it will reach quiescence far more quickly than any 2PC approach could hope for. Synchronising transactions over the wire is devastating for performance.

> It also tries to
> support all the rest of the applications which *don't* have anything to
> do with accounting or its particular characteristics.
>
> > The durability requirement on these transactions can be relaxed. For
> > example when they commit they can release locks before flushing the
> > log, making the locks less sensitive to file I/O.
>
> Even this is not really true: the new state of the queue still needs to
> be forced into stable storage before the transaction can safely return.
> Otherwise a crash before the message insertion is made persistent would
> cause an inconsistency between the states the client and the DBMS see.

No the queue is part of the local DB state. Local transactions ensure atomicity across all local DB state irrespective of durability.

These local transactions don't do any network I/O. There is no need for them to be flushed.

The need to force the log relates to a requirement of durability on local messages in a queue that are asynchronously posted to a peer. This provides a lot of freedom for forcing the log much less often than the ingestion rate.

> > I wonder if they always can be (when one considers the "bigger
> > picture" and one isn't constrained by an existing legacy system).
>
> Hmm. I would not easily criticise the modern work on consistency and
> concurrency based on a single example -- accounting -- that in itself is
> several millennia of age.

Yes it's only one example. Received on Thu Dec 27 2007 - 02:57:41 CET

Original text of this message