replication via tuple streams

From: Marshall <>
Date: 28 Mar 2007 11:15:21 -0700
Message-ID: <>

It seems like there are two ways to implement database replication in a master/slave configuration: replicating DML or replicating insert/delete tuple streams.

(I will speak of the single table case without considering schema changes.)

We have database master and slave, and table A in each. In the replicate-DML case, any DML statement (insert, update, delete) issued to master.A is replicated to slave.A and applied. This sounds very tidy, in that the DML statement itself is likely to be quite compact. However there is a limitation. Consider table master.B that is not replicated, and a statement like:

  insert into A select * from B

When that statement arrives at the slave, it won't be able to execute it because it doesn't have table B.

So this approach more or less requires replication for the transitive closure of the set of tables you want + the set of tables that might get mentioned in DML.

In another approach, we replicate instead the tuple streams. Imperative statements may cause some tuples to be deleted, or some to be inserted, or some to be deleted then some to be inserted (update.) The lack of table B in the above query is no difficulty, because the actual tuples in the subselect are transmitted, which is exactly what needs to be inserted to slave.A.

This has another advantage in that those tuple streams themselves are interesting.

However suppose we have a table of YouTube videos.

create table Videos(
  Id int primary key,
  LastViewed datetime,
  Video long blob

Key, the time of last viewing, and the video itself. The first two are small fixed size fields, and the third is ginormous. Suppose our application is that every time someone watches a video, we update LastViewed.

If we are replicating that table via insert/delete, then every update to LastViewed will transmit the whole tuple, including the ginormous video blob, twice. We could try to code up an update stream to join the insert/delete streams, but this adds a lot of complexity. The update stream isn't simple tuples; it has variable "width" in that different subsets of attributes may get changed.

While writing this message it occurred to me that one could decompose the Videos table into

(Id, LastViewed) and
(Id, Video)

and replicate them separately, and the update problem goes away. (Is this called 6th normal form, or something?)

Anyway, if anyone has any thoughts or comments or previous experience or pointers to interesting reading, I'd be interested.


PS. I hear Oracle replicates tuple streams and MySQL replicates DML. True? Received on Wed Mar 28 2007 - 20:15:21 CEST

Original text of this message