Re: replication via tuple streams
Date: Thu, 29 Mar 2007 01:54:07 GMT
Message-ID: <3%EOh.16769$PV3.173102_at_ursa-nb00s0.nbnet.nb.ca>
Marshall wrote:
> It seems like there are two ways to implement database
> replication in a master/slave configuration: replicating
> DML or replicating insert/delete tuple streams.
>
> (I will speak of the single table case without considering
> schema changes.)
>
> We have database master and slave, and table A in each.
> In the replicate-DML case, any DML statement (insert,
> update, delete) issued to master.A is replicated to
> slave.A and applied. This sounds very tidy, in that the
> DML statement itself is likely to be quite compact.
> However there is a limitation. Consider table master.B
> that is not replicated, and a statement like:
>
> insert into A select * from B
>
> When that statement arrives at the slave, it won't
> be able to execute it because it doesn't have table B.
>
> So this approach more or less requires replication for
> the transitive closure of the set of tables you want +
> the set of tables that might get mentioned in DML.
>
> In another approach, we replicate instead the tuple
> streams. Imperative statements may cause some tuples
> to be deleted, or some to be inserted, or some to be
> deleted then some to be inserted (update.) The lack
> of table B in the above query is no difficulty, because
> the actual tuples in the subselect are transmitted,
> which is exactly what needs to be inserted to slave.A.
>
> This has another advantage in that those tuple streams
> themselves are interesting.
>
> However suppose we have a table of YouTube videos.
>
> create table Videos(
> Id int primary key,
> LastViewed datetime,
> Video long blob
> );
>
> Key, the time of last viewing, and the video itself.
> The first two are small fixed size fields, and the third
> is ginormous. Suppose our application is that every
> time someone watches a video, we update LastViewed.
>
> If we are replicating that table via insert/delete, then
> every update to LastViewed will transmit the whole
> tuple, including the ginormous video blob, twice. We
> could try to code up an update stream to join the
> insert/delete streams, but this adds a lot of complexity.
> The update stream isn't simple tuples; it has variable
> "width" in that different subsets of attributes may get
> changed.
Choose the third way: handle it at the physical level and only send what is necessary; although, not necessarioy the DML.
In the case of updates, one would need to send a tuple identifier and the parts that changed. As long as the video didn't change, there is no need to send it.
> While writing this message it occurred to me that
> one could decompose the Videos table into
>
> (Id, LastViewed) and
> (Id, Video)
>
> and replicate them separately, and the update problem
> goes away. (Is this called 6th normal form, or something?)
>
> Anyway, if anyone has any thoughts or comments or
> previous experience or pointers to interesting reading,
> I'd be interested.
6th normal form has a number of advantages. Thinking about it has caused me to think we need more convenient ways to specify joins among sets of tables and to specify constraints against those joined sets. Received on Thu Mar 29 2007 - 03:54:07 CEST