Re: Normalization and Derived Information

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Tue, 12 Oct 2004 10:02:51 -0400
Message-ID: <b6ogkc.34m.ln_at_mercury.downsfam.net>


Tony Andrews wrote:

> Bill H wrote:

>> There are some excellent reasons for storing calculated values;

> especially
>> in accounting and other money handling applications.  In fact, one

> might
>> even go so far as to state it is advisable to maintain duplicate data

> in
>> order to effectuate financial error control.

>
> I never quite "get" the accounting and audit mind-set. In this example
> we have 2 choices:
> 1) Materialize the total on demand: "the total value of orders is
> $1249.43"
> 2) Maintain the total as duplicate data, so that we can display on
> demand: "the total value of orders is $1249.43"
>
> Now in case (1) the total is sure to be right, whereas in (2) it could
> be wrong because our stored total has somehow got out of sync with the
> detail data. Yet this is somehow seen as a "good thing", because if we
> add up all the details and find that that DO agree with the stored
> total (as they should), we somehow have more confidence in the total.
> Why? I am sure I must be missing something.

The total in case 1 is sure to be wrong, which I state from the experience of watching it happen. This is because it is very easy to consider the case of extended price = price * qty, and from there to imagine that Order Total = SUM(extended price). What could go wrong, right? Who could mess that up?

The problem is that that is just the beginning. It gets more and more complicated with discounts, taxes, commissions, freight and so forth. The discount alone can apply to the entire order, to some items by category, or be based on the vendor supplying the items. This complication in an of itself does not mean it will be wrong, but human beings are not good at complicated situations.

The most devastating problem is the fact that a healthy program is changing over time, people are adding to it. If the shop does not have adequate regression testing (and of course everywhere we've all worked has always had perfect regression testing, right?) then programmers will break previous calculations when they introduce new ones. Happens all the time. Then the customer is screaming on Christmas eve, "My invoices are different from the orders I printed two weeks ago, you fix it now!"

Then of course the programmers, in the rush created by scenarios like the above, will put code out that just plain has bugs. Happens to the best of us. So a customer calls up on Friday at 5:00pm saying, "I printed this invoice two weeks ago and today the numbers are wrong." So you say, "That's good though, because we fixed a bug for you!" "What? You mean my invoices have been wrong? How long? Were they TOO LOW!? How much money did you cost me?"

Finally is the fact that the base numbers being referenced, prices, discount percents, commission percents and so forth can change over time. If the changes are not stored somehow somewhere then you cannot reprint history documents. That's very bad.

In all cases, the problem is time. On day 1 you have one set of data and code, and on day x you have different code, data, or both. Any strategy that guarantees correctness must take this into account. Materialization is only one strategy.

The audit-trail idea is just about the way people work. I once watched three people work for 3 hours each on a quote until they all got the same answer, which was wrong, it was thousands of dollars low. I got the right answer in 30 minutes, but only because I don't trust myself and I wrote out every single step of every calculation. When I showed it to them, they agreed it was right because each of them went through every single calculation. When I wrote the program to do it, I materialized all of those steps, so that when the boss questioned a quote and insisted they prove the computer go it right, they could pull the detailed calculations and walk through them. Even the boss could do it! :)

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Tue Oct 12 2004 - 16:02:51 CEST

Original text of this message