Re: Storing derived and derivable data

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Fri, 05 May 2006 13:48:29 GMT
Message-ID: <NII6g.3548$A26.92830_at_ursa-nb00s0.nbnet.nb.ca>


Kenneth Downs wrote:

> dawn wrote:
> 
> 

>>Jan Hidders wrote:
>>
>>>dawn wrote:
>>>
>>>>Jan Hidders wrote:
>>>>
>>>>>dawn wrote:
>>>>>
>>>>>>Is there database theory that includes identification of
>>>>>>
>>>>>>1. the fact that values for an attribute either were or could have
>>>>>>been derived?
>>>>>>2. how values for an attribute were derived?
>>>>>>3. how values for an attribute could have been derived?
>>>>>
>>>>>That is usuallly taken care of by specifying the database
>>>>>constraints that define the relationships between the derived column
>>>>>and the other data, and then marking it as either derived or not.
>>>>>
>>>>>This is rather trivial. So I'm wondering if I understood your
>>>>>question correctly.
>>>>
>>>>It is very possible that I'm missing something basic. I know how to
>>>>tie derived data specifications to the stored data from which they are
>>>>derived. Now let's use that derived data to determine a value and
>>>>store it back in an existing base table. Is there a way for the DBMS
>>>>to retain the information about the source of this materialized
>>>>attribute?
>>>
>>>What do you mean by 'the source' and what information do you want to
>>>keep about it? Apparently not the definition of the derived column. Are
>>>you talking about data provenance here?
>>
>>I'm going to skip attempting to translate most of the terminology I
>>typically use and give it another go.
>>
>>Example: I have a file STUDENT_COURSES that includes grades and a field
>>named COURSE_GRADE. In a STUDENT file, I have a virtual field named
>>CURRENT_GPA and backing it is code (stored procedure-like) that blows
>>through all courses for a student and computes the GPA. So, this
>>CURRENT_GPA field is a virtual field, derived data, computed column,
>>user-defined function or whatever you want to call it.
> 
> I would be careful to avoid the idea that the procedure is processing
> through the table row-by-row.  The calculated value of the CURRENT_GPA is
> intrinsic and should always be correct.  Whenever any row in any table that
> "feeds" this value changes, the changes must propagate all the way through
> within that transaction, and must be wholly visible everywhere when the
> transaction commits.  
> 

>>In my neck of the woods, that specification would be on the derived
>>field (CURRENT_GPA), but there would be no information on the
>>materialized version (M_GPA) that specifies that it is a snapshot of
>>CURRENT_GPA. So, there is nothing in the dbms that tells us things
>>like the fact that no apps should update M_GPA because there is a
>>process that derives and stores it and that is the only process that
>>should do that. Although it makes sense in my implementation that
>>there would be no constraint in the dbms (since there are few
>>constraints specified to the dbms), it sure would be nice to have
>>information that ties the materialized attribute to the derived
>>attribute.
> 
> You want to avoid a duality between two values, one materialized, one not. 
> Materialize it and be done with it, otherwise you will be fielding calls
> from people getting incorrect results.


Jan, do you see the sort of shit that results when you encourage the willful ignorants?

What's worse this latest ignorant uses multiple email addresses, which just makes more work for filtering the idiot. Received on Fri May 05 2006 - 15:48:29 CEST

Original text of this message