Re: Newbie question about db normalization theory: redundant keys OK?

From: David BL <davidbl_at_iinet.net.au>
Date: Tue, 25 Dec 2007 16:28:51 -0800 (PST)
Message-ID: <d71bfce2-d3da-4faa-875d-06efc87a2d82_at_s12g2000prg.googlegroups.com>


On Dec 25, 12:30 am, Sampo Syreeni <de..._at_iki.fi> wrote:
> On 2007-12-23, David BL wrote:
>
> > I've never thought much of multiphase commit protocols. Instead I'm a
> > fan of persistent message queues.
>
> I don't quite see them as solving the same problem. If your queue is
> such that all of the updates posted in it are always applied, then in a
> distributed environment it would seem that you need something like a
> multiphase commit to get agreement on what to insert into the queue in
> the first place. Or if some of the queued transactions can be rolled
> back, then you need distributed agreement when the queued transaction is
> finally executed. Finally, it obviously solves the problem if the
> transactions are serialized before insertion into the queue, but then we
> lose in concurrency and the queue easily becomes a central hot spot.
>
> So, could you perhaps elaborate a bit?

Consider the following hypothetical example: there are two geographically separated databases storing account balances. There is no replication - each DB is recording independent accounts.

Suppose we want a transfer between accounts needing a distributed transaction. The two phase commit protocol will involve a "prepare to commit" phase, and each DB will need to lock the relevant account balances. The locks aren't released until the participants acknowledge the commit message (assuming the presumed rollback protocol). Network failures could make this take a very long time. ie network failure can upset the autonomy of the databases. How bad is that!

Instead consider that we avoid TPC entirely, and use persistent message queues...

Suppose we want to transfer from account balance Ax on machine X into account balance Ay on machine Y.

Let X have a persistent message queue Qx that records one way messages to be sent from X to Y. Each message represents a deposit to be performed on an account on Y. Y records a persistent sequence number Nx for the number of messages it has processed from Qx. A middleware layer allows Y to pump messages from Qx and it uses Nx to ensure it processes each message exactly once.

Similarly Y has Qy for messages to be sent from Y to X and X records sequence number Ny.

A successful transfer involves the following steps

  1. On X, a local transaction does the withdrawal on Ax and pushes message onto Qx that represents a deposit that needs to be performed on Ay. Ax and Qx are updated atomically on X.
  2. Y reads message from Qx then opens a transaction that atomically updates Ay and Nx.

Now consider that Y needs to cancel the transfer. Eg this could occur because by the time Y processes the deposit message the relevant account has been closed. Then we simply introduce a message to cancel the transfer, which in a fashion can be compared to a presumed commit TPC. In fact a cancel of a transfer is nothing other than a deposit message to undo the original withdrawal.

Note that when we account for the outstanding (ie not yet processed by receiver) deposit messages, the total money in the system is conserved by the local transactions. Note as well that local transactions are performed without any network I/O whatsoever.

The durability requirement on these transactions can be relaxed. For example when they commit they can release locks before flushing the log, making the locks less sensitive to file I/O. Accounts are locked for extremely short periods. It is a vastly superior approach to TPC. IMO the whole idea of a "distributed transaction" is disgusting. It seems tragic for a system to use them if they can be avoided. I wonder if they always can be (when one considers the "bigger picture" and one isn't constrained by an existing legacy system). Received on Wed Dec 26 2007 - 01:28:51 CET

Original text of this message