Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.fjserv.net!newsfeed.freenet.de!newspeer1.nwr.nac.net!newspeer.monmouth.com!newscon06.news.prodigy.com!prodigy.net!border1.nntp.dca.giganews.com!nntp.giganews.com!local01.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Fri, 05 May 2006 07:08:13 -0500
Message-ID: <b4rri3-tv7.ln1@pluto.downsfam.net>
From: Kenneth Downs <knode.wants.this@see.sigblock>
Subject: Re: Storing derived and derivable data
Newsgroups: comp.databases.theory
Date: Fri, 05 May 2006 07:56:17 -0400
References: <1145622076.958951.174100@t31g2000cwb.googlegroups.com> <1146405867.756355.5100@j33g2000cwa.googlegroups.com> <1146455943.415608.156690@i40g2000cwc.googlegroups.com> <1146488907.381952.155590@e56g2000cwe.googlegroups.com> <1146490692.477238.258580@y43g2000cwc.googlegroups.com>
Organization: Secure Data Software, Inc.
User-Agent: KNode/0.8.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7Bit
Lines: 69
X-Trace: sv3-5H6xE/4jtySrASkFTBeHwFVLJkV9QQk1uluAlkXe3vj712teDDZDFxzvJu9lP/xwjd3RCGbe+lKM92W!r+I+cTOajC6LDCGFEHsoqfR+NOUXs4IH5JITGJcbvgQUevjKkj5fZKGzPQ==
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.32
Xref: dp-news.maxwell.syr.edu comp.databases.theory:39218

dawn wrote:

> Jan Hidders wrote:
>> dawn wrote:
>> > Jan Hidders wrote:
>> > > dawn wrote:
>> > > > Is there database theory that includes identification of
>> > > >
>> > > > 1. the fact that values for an attribute either were or could have
>> > > > been derived?
>> > > > 2. how values for an attribute were derived?
>> > > > 3. how values for an attribute could have been derived?
>> > >
>> > > That is  usuallly taken care of by specifying the database
>> > > constraints that define the relationships between the derived column
>> > > and the other data, and then marking it as either derived or not.
>> > >
>> > > This is rather trivial.  So I'm wondering if I understood your
>> > > question correctly.
>> >
>> > It is very possible that I'm missing something basic.  I know how to
>> > tie derived data specifications to the stored data from which they are
>> > derived.  Now let's use that derived data to determine a value and
>> > store it back in an existing base table.  Is there a way for the DBMS
>> > to retain the information about the source of this materialized
>> > attribute?
>>
>> What do you mean by 'the source' and what information do you want to
>> keep about it? Apparently not the definition of the derived column. Are
>> you talking about data provenance here?
> 
> I'm going to skip attempting to translate most of the terminology I
> typically use and give it another go.
> 
> Example: I have a file STUDENT_COURSES that includes grades and a field
> named COURSE_GRADE.  In a STUDENT file, I have a virtual field named
> CURRENT_GPA and backing it is code (stored procedure-like) that blows
> through all courses for a student and computes the GPA.  So, this
> CURRENT_GPA field is a virtual field, derived data, computed column,
> user-defined function or whatever you want to call it.

I would be careful to avoid the idea that the procedure is processing
through the table row-by-row.  The calculated value of the CURRENT_GPA is
intrinsic and should always be correct.  Whenever any row in any table that
"feeds" this value changes, the changes must propagate all the way through
within that transaction, and must be wholly visible everywhere when the
transaction commits.  

> 
> In my neck of the woods, that specification would be on the derived
> field (CURRENT_GPA), but there would be no information on the
> materialized version (M_GPA) that specifies that it is a snapshot of
> CURRENT_GPA.  So, there is nothing in the dbms that tells us things
> like the fact that no apps should update M_GPA because there is a
> process that derives and stores it and that is the only process that
> should do that.  Although it makes sense in my implementation that
> there would be no constraint in the dbms (since there are few
> constraints specified to the dbms), it sure would be nice to have
> information that ties the materialized attribute to the derived
> attribute.

You want to avoid a duality between two values, one materialized, one not. 
Materialize it and be done with it, otherwise you will be fielding calls
from people getting incorrect results.

-- 
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)
