Re: Low level record retrieval and concurrency issue.

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 9 Feb 2003 12:06:35 -0800
Message-ID: <c0d87ec0.0302091206.17c4d560_at_posting.google.com>

Let me do a "cut & paste" from my book, DATA & DATABASES:

20.2 Transactions and ACID

There is a handy mnemonic for the four characteristics we want in a transaction: the ACID properties. The initials are short for four properties we have to have in a transaction professing scheme.

Atomicity. This means that the whole transaction becomes persistent in the database or nothing in the transaction becomes persistent. The data becomes persistent in SQL-92 when a COMMIT statement is successfully executed. The transaction is removed by a ROLLBACK statement and the database restored to its prior (consistent) state before the transaction began.

The COMMIT or ROLLBACK statement can be explicitly executed by the user or by the database engine when it finds an error. Most SQL engines default to a ROLLBACK, unless it is configured to do otherwise.

Atomicity means that if I were to try to insert one million rows into a table and one row of that million violated a referential constraint, then the whole set of one million rows would be rejected and the database would do an automatic ROLLBACK WORK.

Here is the trade-off. If you do one long transaction, then you are in danger of being screwed by just one tiny little error. However, you do several short transactions in a session, then other users can have access to the database between your transactions and they might change things much to your surprise.

The solution has been to implement SAVEPOINT or CHECKPOINT options which act much like a book marker. A transaction sets savepoints during its execution and lets the transaction to perform a local rollback to the checkpoint. In our example, we might have been doing savepoints every 1000 rows, so that when the 999,999-th row inserted has an error that would have caused a ROLLBACK, the database engine removes only the work done after the savepoint was set and the transaction is restored to the state of uncommitted work (i.e. rows 1 thru 999,000) that existed before the savepoint.

This method was debated in the ANSI X3H2 Database Standards committee and ISO. Generally speaking, Americans favored savepoints, while the Europeans did not. The XA transaction model from the X/Open consortium is probably going to be the ISO standard, but you need to look at each particular product you are actaully using.

2) Consistency. When the transaction starts the database is in a consistent state and when it becomes persistent in the database, the database is in a consistent state. The phrase "consistent state" means that all of the data integrity constraints, relational integrity constraints, and any other constraints are true.

However, this does not mean that the database might go thru an inconsistent state during the transaction. The SQL-92 Standard has the ability declare a constraint to be DEFERRABLE or NOT DEFERRABLE for finer control of a transaction.

Consider the single UPDATE statement to increment the PRIMARY KEY of a table:

UPDATE Foobar SET keycol = keycol +1

where the values of keycol are (1, 2, 3, 4). When any of the first three rows are incremented, we will have two rows with duplicate values and that temporarily violates the PRIMARY KEY constraint. But at the end of the UPDATE, the values are (2, 3, 4, 5) and that is consistent with constraints. The same principle applies when the transaction has multiple statements or fires triggers that affect other tables.

3) Isolation. One transaction is isolated from all other transactions. Isolation is also called serializability because it means that transactions act as if they were executed in isolation of each other. One way to guarantee isolation is to go back to serial execution like we had in batch systems. In practice, this might not be a good idea, so the system has to decide how to interleave the transactions to get the same effect.

This actually becomes more complicated in practice because one transaction may or may not actually see the data inserted, updated or deleted by another transaction. This will be dealt with in detail in the section on isolation levels.

4) Durability. The database is stored on a durable media, so that if the database program is destroyed, the database itself is persists. Furthermore, the database can be restored to a consistent state when the database system is restored. Log files and back up procedure figure into this property, as well as disk writes done during processing.

This is all well and good if you have just one user accessing the database at a time. But one of the reasons you have a database system is that you also have multiple users who want to access it at the same time in their own sessions. This leads us to concurrency control.

20.3 Concurrency

Concurrency control is the part of transaction handling which deals with how multiple users access the shared database without running into each other -- like a traffic light system. One way to avoid any problems is to allow only one user in the database at a time. The only problem with that solution is that the other users are going to get lousy response time. Can you seriously imagine doing that with a bank teller machine system or an airline reservation system where tens of thousands users are waiting to get into the system at the same time?

20.3.1 The Three Phenomena

If all you do is execute queries against the database, then the ACID properties hold. The trouble occurs when two or more transactions want to change the database at the same time. In the SQL model, there are three ways that one transaction can affect another.

P1 ("Dirty read"): Transaction T1 modifies a row. transaction T2 then reads that row before T1 performs a COMMIT WORK. If T1 then performs a ROLLBACK WORK, T2 will have read a row that was never committed and that may thus be considered to have never existed.
P2 ("Non-repeatable read"): Transaction T1 reads a row. transaction T2 then modifies or deletes that row and performs a COMMIT WORK. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.
P3 ("Phantom"): Transaction T1 reads the set of rows N that satisfy some <search condition>. transaction T2 then executes statements that generate one or more rows that satisfy the <search condition> used by transaction T1. If transaction T1 then repeats the initial read with the same <search condition>, it obtains a different collection of rows.

These phenomena are not always bad things. If the database is being used only for queries, without any changes being made during the work day, then none of these problems will occur. The database system will run much faster if you do not have to try to protect yourself from them. They are also acceptable when changes are being made under certain circumstances.

Imagine that I have a table of all the cars in the world. I want to execute a query to find the average age of drivers of red sport cars. This query will take some time to run and during that time, cars will be crashed, bought and sold, new cars will be built, and so forth. But if I accept a situation with the three phenomena because the average age will not change that much from the time I start the query to the time it finishes. Changes after the second decimal place really don't matter.

However, you don't want any of these phenomena to occurs in a database where the husband makes a deposit to a joint account and his wife makes a withdrawal. This leads us to the transaction isolation levels

20.3.2 The Four Isolation Levels

In standard SQL, the user gets to set the isolation level of the transactions in his session. The isolation level avoids some the phenomena we just talked about and gives other information to the database. The syntax for the <set transaction statement>

SET TRANSACTION <transaction mode list>

<transaction mode> ::=

   <isolation level>
 | <transaction access mode>
 | <diagnostics size>

<diagnostics size> ::= DIAGNOSTICS SIZE <number of conditions>

<transaction access mode> ::= READ ONLY | READ WRITE

<isolation level> ::= ISOLATION LEVEL <level of isolation>

<level of isolation> ::=

READ UNCOMMITTED

 | READ COMMITTED
 | REPEATABLE READ
 | SERIALIZABLE

The optional <diagnostics size> clause tells the database to set up a list for error messages of a given size. This is an SQL-92 feature, so you might not have it in your particular product. The reason is that a single statement can have several errors in it and the engine is supposed to find them all and report them in the diagnostics area via a GET DIAGNOSTICS statement in the host program.

The <transaction access mode> explains itself. The READ ONLY option means that this is a query and lets the SQL engine know that it can relax a bit. The READ WRITE option lets the SQL engine know that rows might be changed, and that it has to watch out for the three phenomena.

The important clause, which is implemented in most current SQL products is the <isolation level> clause. The isolation level of an transaction defines the degree to which the operations of one transaction are affected by concurrent transactions. The isolation level of a transaction is SERIALIZABLE by default, but the user can explicitly set it in the <set transaction statement>.

The four isolation levels all guarantee that each transaction will be executed completely or not at all, and that no updates will be lost. A ROLLBACK WORK statement may be initiated by the SQL engine when it detects the inability to guarantee the serializability of two or more concurrent transactions or when it detects unrecoverable errors.

Let's take a look at a table of the isolation levels and the three phenomena. A 'Yes' means that the phenomena is possible under than isolation level:

Isolation Levels and the Three Phenomena Isolation Level P1 P2 P3

  SERIALIZABLE           No      No     No
  REPEATABLE READ        No      No    Yes
  READ COMMITTED         No     Yes    Yes
  READ UNCOMMITTED      Yes     Yes    Yes

The SERIALIZABLE isolation level is guaranteed to produce the same results as the concurrent transactions would have had if they had been done in some serial order. A serial execution is one in which each transaction executes to completion before the next transaction begins. The users act as if they are standing in a line waiting to get complete access to the database.

A REPEATABLE READ isolation level is guaranteed to maintain the same image of the database to the user during his session.

A READ COMMITTED isolation level will let transactions in this session see rows that other transactions commit while this session is running.

A READ UNCOMMITTED isolation level will let transactions in this session see rows that other transactions create without necessarily committing while this session is running.

Regardless of the isolation level of the transaction, phenomena P1, P2, and P3 shall not occur during the implied reading of schema definitions performed on behalf of executing an statement, the checking of integrity constraints, and the execution of referential actions associated with referential constraints. We do not want the schema itself changing on users.

The SQL standards do not say HOW you are to achieve these results. However, there are two basic classes of concurrency control methods -- optimistic and pessimistic. Within those two classes, each vendors will have his own implementation. Received on Sun Feb 09 2003 - 21:06:35 CET

This message: [ Message body ]
Next message: --CELKO--: "Re: Creating a Trouble Ticket"
Previous message: --CELKO--: "Re: Hierarchical data"
In reply to: Kevin: "Low level record retrieval and concurrency issue."

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message