Re: schema help

From: Rob <rmpsfdbs_at_gmail.com>
Date: Wed, 26 Dec 2007 12:07:32 -0800 (PST)
Message-ID: <7141596b-b01b-46ae-85dd-02dac2173752_at_i12g2000prf.googlegroups.com>


On Dec 25, 9:10 pm, David BL <davi..._at_iinet.net.au> wrote:
> Interesting to how this relates to a discussion in another thread...
>
I'll try to respond to your well-thought-out response, though you have gone way beyond my question.
>
> Let S' = S + d denote the application of delta d to state S to yield a
> new state S'.   Often S+d is much easier to calculate than d = S' -
> S.  In fact the latter is not always uniquely defined.
>
Agreed. In current systems, d1,....,dn would seem to correspond to a change log. The log can be used to roll forward or roll back. At issue is the form of d. If each d is a before image and and after image, then d is uniquely defined, though S'-S is not.
>
> Consider that there are two separate databases.  The State-DB records
> the current state S without any regard for history.   The History-DB
> only records the history as a sequence of deltas [d1,d2,...,dn] but is
> not directly concerned with calculating the current state.
>
Here I think you are expressing the spirit of the question: Is it OK to store the deltas in a database model? Just to be clear, let's assume that the initial database state S0 (S zero) is the empty database. As the OP suggests, I could maintain a sequence of database states S0, S1, ..., Sn in which case I could (imperfectly) compute the deltas. Or, I can maintain a sequence of deltas and perfectly compute any state. You suggest that I could have two sequences, one of deltas and one (of lower periodicity) for states. I think that's what your next paragraph says:
>
> There is a concept of applying outstanding deltas to S to bring it up
> to date:
>
>     S' = S + [dm,...,dn].
>
> In that sense the State-DB is just a read only view and only the
> History-DB is regarded as the primary source of the data to which
> updates are applied with transactions.  Furthermore these updates tend
> to only add extra tuples to relvars rather than removing or modifying
> existing tuples.  Therefore in theory the History-DB can provide
> excellent ingestion rates and be optimised for writing new deltas as
> sequential data to disk.  Note that the information (as deltas) in the
> History-DB is total ordered.
>
Just one observation: Sets are not ordered, so the total ordering of the History-DB is up to the app, not the RDBMS. Your point about better "ingestion rates" intuitively makes sense, but can the database designer cause the RDBMS to write tuples sequentially to disk? This kind of optimization would likely be implemented at the system level, so in a way, you are suggesting an architectural meta meta model far different from the SQL meta meta model, at least as I understand it.
>
> IMO bringing the State-DB up to date is best performed
> asynchronously.  The State-DB can persist a sequence number to ensure
> it applies each delta in the total order exactly once.  A thread can
> access the State-DB and by reading the sequence number, easily
> determine the subset of the History-DB that is associated with the
> current state of the State-DB.  Furthermore, in theory with pure
> additive changes to the History-DB it would be possible to allow read
> only threads to access the History-DB concurrently with its mutative
> transactions.  This is kind of like MVCC for free.  See
>
>    http://en.wikipedia.org/wiki/Multiversion_concurrency_control
>
I see your point. But of course with "pure additive changes", your above assertion (d = S' - S is not uniquely defined) no longer holds. The wikipedia discussion around MVCC gets into more "how" than is necessary here: The OP doesn't suggest multiple updaters, just a database-as-recording-device from which he could compute deltas. Personally, I need to understand the single user variant before wading into the more complex, multiuser case.
>
> Often deltas can be regarded as events occurring at some point in
> space and time.   For example, a marriage event causes the state to
> change from unmarried to married.  A birth event causes a person to be
> added to a family tree database.
>
> Clearly such events are to be recorded but shouldn't require
> subsequent modification. I tend to think of a History-DB as an
> *events* database, and often represents the proper way to record the
> primary source of information (as distinct from the current state
> which is going to depend a lot more on one's conceptual model and the
> application).  I would go as far as saying that the events DB should
> be associated with the base relvars, and the DBMS should support the
> asynchronous calculation of the current state as a read only view.
>
Again, a little more complex than I can follow. If the current state is a read only view, then either it is never brought up to date, or, the database is periodically shut down and all the events since the last current state are applied, replacing the old current state with the new current state. I assume there is some read only view of the events that answers the question "what has changed between the old and new current states?" (That's the question the OP wants to answer.)
>
> I imagine that only an events DB has some realistic chance of making
> the world's data look like a single information source - because it
> has less application bias and completely avoids the quandaries of
> distributed transactions.
>
"making the the world's data look like a single information source" is beyond my ken as is the rest of your reply. (I'm just a humble toolmaker.) I do thank you for taking the time to reply.
>
> For certain applications I anticipate that an OO physical
> representation could be desirable for caching specialised uses
> (navigations) of the data in a derived State-DB.  However there is
> little doubt that the RM is best for general use.
>
> Note that a History-DB can easily and efficiently make its deltas
> available to any number of asynchronous State-DBs without blocking its
> mutative threads (and upsetting the History-DB ingestion rate).
>
> When scaling to distributed data sources (ie a State-DB works against
> multiple History-DBs), vector times are relevant. See
>
>    http://en.wikipedia.org/wiki/Vector_clocks- Hide quoted text -
>
> - Show quoted text -
Received on Wed Dec 26 2007 - 21:07:32 CET

Original text of this message