Re: Normalization and Derived Information

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Sat, 09 Oct 2004 15:58:42 -0400
Message-ID: <mtf9kc.so1.ln_at_mercury.downsfam.net>


--CELKO-- wrote:

>>> A trigger can compute the column and then it is always in sync. <<

>
> But then the computed value takes up space and it is faster to compute
> it in main storage than to read it off the disk. Also, triggers are
> proprietary inspite of the existence of standards for them.
>
> Let me some more things to the history table while I am here:
>
> CREATE TABLE PriceHistory
> (upc CHAR(9) NOT NULL,
> price DECIMAL (12,4) NOT NULL,
> start_date DATE NOT NULL,
> end_date DATE, -- null means current
> CONSTRAINT history_is_sorted
> CHECK (start_date <= end_date)
> CONSTRAINT periods_are_non_overlapping
> CHECK (NOT EXISTS
> (SELECT *
> FROM PriceHistory AS P1
> WHERE PriceHistory.upc = P1.upc
> AND PriceHistory.startdate <> P1.startdate
> AND PriceHistory.startdate BETWEEN P1.startdate AND
> P1.enddate)),
> CONSTRAINT one_current_price_per_product
> CHECK (NOT EXISTS
> (SELECT upc
> FROM PriceHistory AS P1
> GROUP BY upc
> HAVING COUNT(startdate) <> COUNT(enddate) -1),
> ..);
>
> The table level constraints are legal in Standard SQL, but not
> implement in most products yet.

Although I still believe that pre-stored calculated values are faster than calc-at-read, I'm going to drop that line for now and try to pursue your suggestion of the history table, I want to make sure I fully understand what you are suggesting.

So, the history table solves the problem of time as a dimension; it relieves us of the need to violate 3NF, which violation occurs if we copy price into the orders table. So if we go with the history table and preserve 3NF, then to get the extended price of the order, we are looking at something like this (not claiming SQL is perfect here):

SELECT ordh.ord_date,

       ordd.item_id,ordd.qty,
       ph.price,
       ph.price * ordd.qty as price_extended       
  FROM ORDERS ordh
  JOIN ORDERS_DETAIL ordd ON ordh.ord_num = ordd.ord_num   JOIN PRICE_HISTORY ph ON ordd.item = ph.item
                       AND ordh.ord_date >= ph.begin_date
                       AND (ordh.ord_date <= ph.end_date 
                            OR ph.end_date IS NULL)


From here my own non-negotiable design requirement is that people such as customer IT staff should never have to code the above, they should just be able to "SELECT price_extended " from somewhere, so the above can be put into a view and so far so good.

Well, hmmmmmm, we (read: you) seem to worked the time dimension in satisfactorily, we now have to ask what could screw this up? Well, complexity and deeply nested calculations will perform poorly at read, while materializing them will delay writes, so since we want fast writes (and small transactions) that's another nail in the coffin.

Another objection is that this approach will lead to a proliferation of tables, but that is not a bad thing, that is a good thing. One must have adequate tools for managing the creation and maintenance of tables.

Well, shoot, we also have the fact that a price history table is, in and of itself, a good thing.

Well, I'm just going to have to think about this...

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Sat Oct 09 2004 - 21:58:42 CEST

Original text of this message