Single-statement 'write consistency' on read committed. Oh, really?

From: Tengiz Kharatishvili <tengizk_at_Hotmail.com>
Date: 29 Nov 2003 17:21:44 -0800
Message-ID: <3fcc40e7.0311291721.46084513@posting.google.com>

Please note that in the following I'm only concerned about single-statement read committed transactions. I do realize that for a multi-statement read committed transaction Oracle does not guarantee transaction set consistency without techniques like select for update or explicit hand-coded locking.

According to the documentation Oracle guarantees 'statement-level transaction set consistency' for queries in read committed transactions. In many cases, Oracle also provides single-statement write consistency. However, when an update based on a consistent read tries to overwrite changes committed by other transactions after the statement started, it creates a write conflict. Oracle never reports write conflicts on read committed. Instead, it automatically handles them based on the new values for the target table columns referenced by the update.

Let's consider a simple example. Again, I do realize that the following design might look strange or even sloppy, but the ability to produce a quality design when needed is not an issue here. I'm simply trying to understand the Oracle's behavior on write conflicts in a single-statement read committed transaction.

A valid business case behind the example is rather common - a financial institution with two-stage funds transfer processing. First, you submit a transfer (put transfer amounts in the 'pending' column of the account) in case the whole financial transaction is in doubt. Second, after you got all the necessary confirmations you clear all the pending transfers making the corresponding account balance changes, resetting pending amount and marking the accounts cleared by setting the cleared date. Neither stage should leave the data in inconsistent state: sum (amount) for all rows should not change and the sum (pending) for all rows should always be 0 on either stage:

Setup:

create table accounts
(

      acc int primary key, 
      amount int, 
      pending int, 
      cleared date

);

Initially the table contains the following:

ACC AMOUNT PENDING CLEARED

---------- ---------- --------- 1 10 -2 2 0 2 3 0 0 26-NOV-03

So, there is a committed database state with a pending funds transfer of 2 dollars from acc 1 to acc 2. Let's submit another transfer of 1 dollar from acc 1 to acc 3 but do not commit it yet in SQL*Plus Session 1:

update accounts
set pending = pending - 1, cleared = null where acc = 1;

update accounts
set pending = pending + 1, cleared = null where acc = 3;

ACC AMOUNT PENDING CLEARED

---------- ---------- --------- 1 10 -3 2 0 2 3 0 1

And now let's clear all the pending transfers in SQL*Plus Session 2 in a single-statement read-committed transaction:

update accounts
set amount = amount + pending, pending = 0, cleared = sysdate where cleared is null;

Session 2 naturally blocks. Now commit the transaction in session 1. Session 2 readily unblocks:

ACC AMOUNT PENDING CLEARED

---------- ---------- --------- 1 7 0 26-NOV-03 2 2 0 26-NOV-03 3 0 1

Here we go - the results produced by a single-statement transaction read committed transaction in session 2, are inconsistent – the second funds transfer has not completed in full. Session 2 should have produced the following instead:

ACC AMOUNT PENDING CLEARED

---------- ---------- --------- 1 7 0 26-NOV-03 2 2 0 26-NOV-03 3 1 0 26-NOV-03

Please note that we would have gotten the correct results if we ran the transactions in session 1 and session 2 serially. Please also note that no update has been lost. The type of isolation anomaly observed is usually referred to as a 'read skew', which is a variation of 'fuzzy read' a.k.a. 'non-repeatable read'.

But if in the session 2 instead of:

scenario 1 update accounts set amount = amount + pending, pending = 0, cleared = sysdate where cleared is null;

we issued:

scenario 2 update accounts set amount = amount + pending, pending = 0, cleared = sysdate where cleared is null and pending <> 0;

or even:

scenario 3 update accounts set amount = amount + pending, pending = 0, cleared = sysdate where cleared is null and (pending * 0) = 0;

We'd have gotten what we really wanted.

I'm very well aware of the 'select for update' or serializable il solution for the problem. Also, I could present a working example for precisely the above scenario for a major database product, providing the results that I would consider to be correct. That is, the interleaving execution of the transactions has the same effect as if they completed serially. Naturally, no extra hand-coded locking techniques like select for update or explicit locking is involved.

And now let's try to understand what just has happened. Playing around with similar trivial scenarios one could easily figure out that Oracle clearly employs different strategies when handling update conflicts based on the new values for the target table columns, referenced by the update. I have observed the following cases:

The column values have not changed: Oracle simply resumes using the current version of the row. It's perfectly fine because the database view presented to the statement (and hence the final state of the database after the update) is no different from what would have been presented if there had been no conflict at all.
The row (including the columns being updated) has changed, but the predicate columns haven't (see scenario 1): Oracle resumes using the current version of the row. Formally, this is acceptable too as the ANSI read committed by definition is prone to certain anomalies anyway (including the instance of a 'read skew' we've just observed) and leaving behind somewhat inconsistent data can be tolerated as long as the isolation level permits it. But please note - this is not a 'single-statement write consistent' behavior.
Predicate columns have changed (see scenario 2 or 3): Oracle rolls back and then restarts the statement making it look as if it did present a consistent view of the database to the update statement indeed. However, what seems confusing is that sometimes Oracle restarts when it isn't necessary, e.g. when new values for predicate columns don't change the predicate itself (scenario 3). In fact, it's bit more complicated – I also observed restarts on some index column changes, triggers and constraints change things a bit too – but for the sake of simplicity let's no go there yet.

And here come the questions, assuming that (B) is not a bug, but the expected behavior:

Does anybody know why it's never been documented in detail when exactly Oracle restarts automatically on write conflicts once there are cases when it should restart but it won't? Many developers would hesitate to depend on the feature as long as it's not 'official'. Hence, the lack of the information makes it virtually useless for critical database applications and a careful app developer would be forced to use either serializable isolation level or hand-coded locking for a single-statement transaction.

If, on the other hand, it's been documented, could anybody please point me to the bit in the documentation that:

Clearly states that Oracle might restart an update statement in a read committed transaction because otherwise it would produce inconsistent results.
Unambiguously explains the circumstances when Oracle does restart.
Gives clear and unambiguous guidelines on when Oracle doesn't restart and therefore when to use techniques like select for update or the serializable isolation level in a single-statement read committed transaction.
Does anybody have a clue what was the motivation for this peculiar design choice of restarting for a certain subset of write conflicts only? What was so special about them? Since (B) is acceptable for read committed, then why Oracle bothers with automatic restarts in (C) at all?
If, on the other hand, Oracle envisions the statement-level write consistency as an important advantage over other mainstream DBMSs as it clear from the handling of (C), does anybody have any idea why Oracle wouldn't fix (B) using well-known techniques and always produce consistent results?

Received on Sat Nov 29 2003 - 19:21:44 CST