Re: Storing derived and derivable data

From: dawn <dawnwolthuis_at_gmail.com>
Date: 1 May 2006 06:38:12 -0700
Message-ID: <1146490692.477238.258580_at_y43g2000cwc.googlegroups.com>


Jan Hidders wrote:
> dawn wrote:
> > Jan Hidders wrote:
> > > dawn wrote:
> > > > Is there database theory that includes identification of
> > > >
> > > > 1. the fact that values for an attribute either were or could have been
> > > > derived?
> > > > 2. how values for an attribute were derived?
> > > > 3. how values for an attribute could have been derived?
> > >
> > > That is usuallly taken care of by specifying the database constraints
> > > that define the relationships between the derived column and the other
> > > data, and then marking it as either derived or not.
> > >
> > > This is rather trivial. So I'm wondering if I understood your question
> > > correctly.
> >
> > It is very possible that I'm missing something basic. I know how to
> > tie derived data specifications to the stored data from which they are
> > derived. Now let's use that derived data to determine a value and
> > store it back in an existing base table. Is there a way for the DBMS
> > to retain the information about the source of this materialized
> > attribute?
>
> What do you mean by 'the source' and what information do you want to
> keep about it? Apparently not the definition of the derived column. Are
> you talking about data provenance here?

I'm going to skip attempting to translate most of the terminology I typically use and give it another go.

Example: I have a file STUDENT_COURSES that includes grades and a field named COURSE_GRADE. In a STUDENT file, I have a virtual field named CURRENT_GPA and backing it is code (stored procedure-like) that blows through all courses for a student and computes the GPA. So, this CURRENT_GPA field is a virtual field, derived data, computed column, user-defined function or whatever you want to call it.

Additionally, I have decided to materialize this value by firing off a process whenever any of the underlying stored data are changed so that the value of CURRENT_GPA at that point in time is stored in the STUDENT file as something like M_GPA. There could then be a slight lag where CURRENT_GPA and M_GPA are not in synch. Users will now want to query the M_GPA data because it is faster.

> > Or said differently, is there a way, other than app or trigger code, to
> > tie the materialized version of an attribute to the derived attribute
> > from which the value was taken?
>
> ?? The derived attribute *is* the materialized version.
>
> > David pointed me to Sybase as a DBMS that seems to have the feature of
> > storing materialized attributes, otherwise I'm figuring with most
> > DBMS's it is an app or trigger where the code resides to store the
> > data, but I might be completely missing something as my direct
> > experience with sql constraint specification is limited.
>
> One simply specifies the query that derives the value of the column,
> or in the case of a materlized view the whole table. That is the
> definition of the derived column. What more do you want?

In my neck of the woods, that specification would be on the derived field (CURRENT_GPA), but there would be no information on the materialized version (M_GPA) that specifies that it is a snapshot of CURRENT_GPA. So, there is nothing in the dbms that tells us things like the fact that no apps should update M_GPA because there is a process that derives and stores it and that is the only process that should do that. Although it makes sense in my implementation that there would be no constraint in the dbms (since there are few constraints specified to the dbms), it sure would be nice to have information that ties the materialized attribute to the derived attribute.

> > Given that
> > the RM has as a goal not to have duplicate data and this is clearly a
> > duplication of data, I was thinking this might be one place where the
> > application would have the knowledge, rather than the DBMS.
>
> Another goal of the RM is to shield the data from sloppy applications,
> so if duplication cannot be avoided then at least the DBMS should be in
> charge of it and not the applications.

Yes, that is my thinking. Any materialized attributes should be able to be specified as such. I might take a look at Sybase, but was looking for something less implementation-specific on how materialized attributes should be specified, identified in notation, etc. within a base table. Are there constraints that can be specified to indicate that a particular attribute "comes from" a derived attribute (or stored procedure)?

Sorry for my awkward terminology. Thanks. --dawn Received on Mon May 01 2006 - 15:38:12 CEST

Original text of this message