Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> c.d.o.server -> updatable snapshot replication problem 10g (long)

updatable snapshot replication problem 10g (long)

From: Clark Wilson <>
Date: Thu, 13 Dec 2007 00:31:22 +0000 (UTC)
Message-ID: <>

I am trying to figure out how a particular situation comes about, so I can prevent it.

It is a single-master updatable snapshot replication set-up. All updates are done on the subordinate machine and replicated to the master machine.

The database is for the entry and update of items. For creating a new item, the item number is taken from a control row on the subordinate machine; the new item is created and the "next item number" column in the control row is incremented. The item number is part of the primary key and an attempt to re-use it results in a primary-key uniqueness violation. Transactions are committed frequently, so the deferred transactions have from one to three calls in them, never more than three calls. A column in each table records the date and time of the most recent update of that row.

The situation I am trying to explain is the following:

  1. Pre-condition: data is entered up through item #100 and is identical on both master and subordinate boxes.
  2. Users enter new items, say items 101-150. Call this "series A".
  3. Something happens.
  4. Users continue to enter new items on the subordinate box. The system feeds them item numbers that are already used but no primary key violations occur. Let's say they enter items 101-175. Let's call this "series B".
  5. Other stuff happens.
  6. I come on the scene. Series B exists on both the master and the subordinate boxes. Series A exists only as a collection of replication error transactions on the master box. These error transactions have reasonably spaced out creation times from the subordinate box; their reception times on the master box are all at once.

We can presume that the communications between the subordinate and master box have been turned off for long periods of time and then reconnected, an unknown number of times. However, neither database instance has been stopped and restarted.

If I posit the following sequence containing an impossible event, I can make sense of it:

0) Subordinate system and master system are not connected.

  1. Users enter series A. Deferred calls are stacked up in the local place where deferred calls get stored.
  2. Impossible event occurs: All *data* tables magically revert to their status as of just before series A, but the table(s) containing the stored deferred calls are not affected (the series A transactions are still in there).
  3. Users enter series B and a new series of deferred calls is appended to the place where deferred calls are stored. So now both series A and series B exist as deferred transactions on the subordinate box.
  4. The master and subordinate boxes are connected and replication begins. Series A transactions all fail because the transaction preconditions (original values) are wrong. Series B transactions succeed because the transaction preconditions (original values) are okay. The last-update date and time column ensures that A will fail and B will succeed even if other original values are identical.

Aside from the impossible event, this almost makes sense.

I do not have error-log entries to support this sequence. What I have is (for instance) messages on the subordinate box saying that posting a deferror encountered an inability to connect to the "target host" (ORA-12545). Or "Error 2068 trapped in 2PC on transaction ..." But the error messages do not reliably or obviously correlate with the data getting messed up. Received on Wed Dec 12 2007 - 18:31:22 CST

Original text of this message