Re: schema help
Date: Tue, 25 Dec 2007 20:10:38 -0800 (PST)
Message-ID: <259688da-b70f-4595-9610-15ae0458047c_at_s8g2000prg.googlegroups.com>
On Dec 26, 7:40 am, Rob <rmpsf..._at_gmail.com> wrote:
> On Dec 25, 12:37 pm, magawake <magaw..._at_gmail.com> wrote:> Hello:
> > I am trying to learn SQL and RDBMS theories. To acheive this, I
> > decided to assign myself a project.
>
> > My project is basically, account for my NFS mounts, and keep track of
> > them. I want to see which filesystems are growing, and which
> > filesystems are staying the same. I will get an inventory daily. I
> > will grab the data and place it into a database, using PERL or AWK.
>
> [some illustrative data]
>
>
>
> > So my intentention is, to show do stats on these filesystems, and see
> > when it grows and when it stays the same. I will get these stats
> > daily, and hopefully in a month, I can see what is growing, shrinking,
> > and staying the same.
>
> > Any ideas on the schema? do I really need a RDBMS for this?
>
> > TIA
>
> A fascinating question I've long pondered. Since I am interested in
> the general question, I'll take the liberty to restate it, accurately
> I hope:
>
> Since a relational database represents the model state of a known
> universe, is it possible to ask/answer the question: How has the
> database changed over time?
>
> Snodgrass advocates adding begin/end time attributes, for the data
> entry window and the data validity window (with representations for
> minus- and plus infinity). But am I really modeling change-over-time
> or just adding timeframes to snapshots? (Like using a timed exposure
> to take a picture with a camera when there's too little light.)
>
> Would it not be better to utilize the OP's data capture approach by
> periodically evaluating a query, saving the query response externally
> and computing "deltas" between query responses (extensions) over
> time?? (This is fundamentally the data warehouse approach in which a
> relational database is used to capture periodic snapshots.)
>
> What do other people think about this? Is it better to implement a
> temporal database or to capture snapshots of a non-temporal database
> (externally or in a data warehouse) and compute the changes?
Interesting to how this relates to a discussion in another thread...
Consider that there are two separate databases. The State-DB records the current state S without any regard for history. The History-DB only records the history as a sequence of deltas [d1,d2,...,dn] but is not directly concerned with calculating the current state.
There is a concept of applying outstanding deltas to S to bring it up
to date:
S' = S + [dm,...,dn].
In that sense the State-DB is just a read only view and only the
History-DB is regarded as the primary source of the data to which
updates are applied with transactions. Furthermore these updates tend
IMO bringing the State-DB up to date is best performed asynchronously. The State-DB can persist a sequence number to ensure it applies each delta in the total order exactly once. A thread can access the State-DB and by reading the sequence number, easily determine the subset of the History-DB that is associated with the current state of the State-DB. Furthermore, in theory with pure additive changes to the History-DB it would be possible to allow read only threads to access the History-DB concurrently with its mutative transactions. This is kind of like MVCC for free. See
http://en.wikipedia.org/wiki/Multiversion_concurrency_control
Often deltas can be regarded as events occurring at some point in space and time. For example, a marriage event causes the state to change from unmarried to married. A birth event causes a person to be added to a family tree database.
Clearly such events are to be recorded but shouldn't require subsequent modification. I tend to think of a History-DB as an *events* database, and often represents the proper way to record the primary source of information (as distinct from the current state which is going to depend a lot more on one's conceptual model and the application). I would go as far as saying that the events DB should be associated with the base relvars, and the DBMS should support the asynchronous calculation of the current state as a read only view.
I imagine that only an events DB has some realistic chance of making the world's data look like a single information source - because it has less application bias and completely avoids the quandaries of distributed transactions.
For certain applications I anticipate that an OO physical representation could be desirable for caching specialised uses (navigations) of the data in a derived State-DB. However there is little doubt that the RM is best for general use.
Note that a History-DB can easily and efficiently make its deltas available to any number of asynchronous State-DBs without blocking its mutative threads (and upsetting the History-DB ingestion rate).
When scaling to distributed data sources (ie a State-DB works against multiple History-DBs), vector times are relevant. See
http://en.wikipedia.org/wiki/Vector_clocks Received on Wed Dec 26 2007 - 05:10:38 CET