Re: Normalization and Derived Information

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Thu, 23 Sep 2004 11:10:28 -0400
Message-ID: <51puic.ai9.ln_at_mercury.downsfam.net>


Gene Wirchenko wrote:

> Christopher Browne <cbbrowne_at_acm.org> wrote:
>

>>The world rejoiced as Gene Wirchenko <genew_at_mail.ocis.net> wrote:
>>> "Laconic2" <laconic2_at_comcast.net> wrote:
>>>
>>> [snip]
>>>
>>>>The advantage of the extended price is that it can be summed.  (other
>>>>trivia about star schema design omitted).
>>>
>>>      And quantity times price can not?
>>>           select sum(quantity*price) as extendedsum from orders
>>
>>That might require a join against a table with prices, maybehaps...

>
> ...which is othogonal to whether quantity*price can be summed.
>

Orthogonal to the discussion of *if* it can be done, but crucial to the discussion of if it *should* be done.

I have done such a thing, and I have very mixed feelings about it. About two years ago I was working in a shop that had a 700+ table app that was perfectly sparse when it came to derived columns, there did not exist a single one anywhere. They had a client-side system for generating the information, and it was wheezing along well enough, but it was horrible for new guys like me and customers, who could never figure out how the numbers were calculated, because they were undocumented and undocumentable. If you weren't the person who specified the formula, you'd never find out what it was. Practices were also such that the formula might be changed at the request of one customer, thereby changing it for other customers. Ouch.

They also had no data dictionary, so I coded up something that would be compatible with their practices. Naturally in such cases it was 90% code to maintain compatibility with the random practices of the prior 5 years of development, but anything I did new I could keep systematic (and of course in doing add my own random practice ha ha ha).

So if its not too late to make a long story short, I coded up a view generator that would take formulas and foreign key definitions from the DD and generate what I called the "default view", one per table. This view would contain all of the derived columns. The fun part was working out the code to build those joins, but we had a lot of problems that do not show up if you materialize things, such as:

  1. Circular references. Discount Code is in header, but they want discounts calculated line by line. The header view now pulls in the discount percent. The Line Items view pulls in the discount percent from the header view and derives discount amounts. But now how do you sum the extended amounts to the header? The header view cannot reference the detail view because the detail view already references the header view. The possible solutions to this make grown men cry.
  2. Hopelessly unreadable formulas. Consider a chain of definitions such as Total = lines + freight, tax = total * tax rate, final = tax + total. Neither SQL Server nor DB/2 (our target platforms) allow you to reference derived columns in the query where they are derived, so you must nest them in code, like so:

SELECT blah,blah

   lines + freight as Total,
   (lines + freight) * tax_rate as Tax,
   ((lines + freight) * tax_rate) + (lines + freight) as Final

Except that that is not right because you don't know how the platform is going to round things, so you actually need every single numeric to be wrapped in a ROUND, and it gets ugly fast:

SELECT blah,blah

   ROUND(lines + freight,2) as Total,
   ROUND(    ROUND(lines + freight,2) * tax_rate,2) as Tax,
   ROUND(    ROUND(lines + freight,2) * tax_rate,2) + 
             ROUND(lines + freight,2) as Final

Don't get me wrong, it solved many many problems, it is totally self-documenting, cross-platform, version-protecting and lots of other good stuff, but it is not complete. It leaves things undone, and generates code a human cannot verify easily (which in my book is a crime for a code generator). A solution that does not have those problems would be preferable.

If you materialize things, both problems actually disappear, though the solution to problem one is not obvious at first glance.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Thu Sep 23 2004 - 17:10:28 CEST

Original text of this message