Re: the relational model of data objects *and* program objects

From: Kenneth Downs <knode.wants.this_at_see.sigblock>
Date: Thu, 14 Apr 2005 11:01:30 -0400
Message-Id: <99d2j2-rkq.ln1_at_pluto.downsfam.net>


erk wrote:

>

>> That being said, the moment that the user supplies data is the best

> time to
>> store as well the derived information.   This allows their retrieval

> in a
>> way that is really and truly "at will", inasmuch as the SELECT

> command can
>> be exected at will.

>
> I disagree that it's the "best time." Logically, it makes no difference
> when the value is computed, so long as it is correct, so we're only
> speaking of performance, right?

mmm, not really. The claim "best time" is based on showing all other times have problems, which is really gone into below, so I won't pursue it here.

>

>>From a performance standpoint, then, it makes sense to compute values

> during non-peak times - only if possible, of course. A value that's
> requested must be computed if not already stored. But recomputing for
> every change means more wasted work.

Depends heavily on the nature of the work. When building big marketing databases you do everything in a big batch offline and then load it. But when saving a sales order I would argue (without proof, that's for another day) that the small processing time to compute and write derived values far outweighs the problems of not having them. But alas I have to say this without proof and allow any criticisms to pass until I have more than just anecdotes to backup the position.

>

>> All other solutions depart from this maximum efficiency.  The problem

> is
>> shifting definitions over time.  When formulas change over time, you

> have
>> to jump through some serious hoops to preserve the ability to

> reproduce
>> historical information.

>
> Another school of thought would say that if the formula is changing,
> then presumably the previous values were wrong. If a formula change
> produces "new historical values," does that mean the previous ones were
> wrong?

Two possibilities, bug and upgrade.

First, the stored values are wrong due to a bug in the formulas, but they have been printed/EDI'd to a trading partner. Now the wrong information is a matter of record. The school of thought that loves explicit audit trails will correct the bug in the formulas for new transactions, but keep the old "inert" data as-is. The user then enters additional adjusting transactions. This same school of thought will reject a "fix" that requires any great effort to reprint old documents as they originally appeared, requiring that a reprint always reproduce exactly the same result as the first time.

Second, the business rules have changed to an extent that only new formulas will take care of it, the existing run-time configuration parameters are not sufficient. In this case I simply redouble the suggestion that it is easiest to let sleeping dogs lie and evaluate and store all new transactions by the new rules.

>
> More likely is a hybrid context: a formula change will affect some
> values and not others. Stored values makes this and the "all old values
> are wrong" cases difficult.

Wish I could say this never happens. The general rule of thumb is to never alter data in tables, period. Even derived values, though they are produced by formulas written by us, do not belong to us. We do direct updates of old data in the name of fixes only after a thorough conversation with the decision-makers about the relative costs of adjusting entries vs. programmer fix, the irreversibility of programmer fix vs. the trackability of adjusting entries, and so forth. Generally it turns out that when the programmer has truly blundered, he is forced to make a fix.

>
> Hopefully all of the above is done automatically via declarations, by
> the "engine" we're discussing?

Yes.

>

>> On the other hand, if you materialize the values, they are protected

> against
>> shifting definitions.

>
> But you don't always want that protection.

Believe it or not, I agree. Having made the case as best I can above, I woulld offer that you probably don't want stored values for things like summary tables (although I do have a nifty trick to do that in real-time w/o update conflicts).

Here is a case where I have not made up my mind. Consider a derived value, customer total exposure. It is built up through composition from orders and invoices. There is another derived value, customer credit limit. By making use of simple formulas and composition, we can begin to establish constraints as simple comparisons of these values, so that "customer total exposure" must be < = "credit limit". Materializing up to the customer table runs the risk of update conflicts, but then again how often are multiple people writing to the same customer?

-- 
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth_at_(Sec)ure(Dat)a(.com)
Received on Thu Apr 14 2005 - 17:01:30 CEST

Original text of this message