Re: Normalization and Derived Information
Date: Sat, 09 Oct 2004 15:58:42 -0400
Message-ID: <mtf9kc.so1.ln_at_mercury.downsfam.net>
--CELKO-- wrote:
>>> A trigger can compute the column and then it is always in sync. <<
>
> But then the computed value takes up space and it is faster to compute
> it in main storage than to read it off the disk. Also, triggers are
> proprietary inspite of the existence of standards for them.
>
> Let me some more things to the history table while I am here:
>
> CREATE TABLE PriceHistory
> (upc CHAR(9) NOT NULL,
> price DECIMAL (12,4) NOT NULL,
> start_date DATE NOT NULL,
> end_date DATE, -- null means current
> CONSTRAINT history_is_sorted
> CHECK (start_date <= end_date)
> CONSTRAINT periods_are_non_overlapping
> CHECK (NOT EXISTS
> (SELECT *
> FROM PriceHistory AS P1
> WHERE PriceHistory.upc = P1.upc
> AND PriceHistory.startdate <> P1.startdate
> AND PriceHistory.startdate BETWEEN P1.startdate AND
> P1.enddate)),
> CONSTRAINT one_current_price_per_product
> CHECK (NOT EXISTS
> (SELECT upc
> FROM PriceHistory AS P1
> GROUP BY upc
> HAVING COUNT(startdate) <> COUNT(enddate) -1),
> ..);
>
> The table level constraints are legal in Standard SQL, but not
> implement in most products yet.
So, the history table solves the problem of time as a dimension; it relieves us of the need to violate 3NF, which violation occurs if we copy price into the orders table. So if we go with the history table and preserve 3NF, then to get the extended price of the order, we are looking at something like this (not claiming SQL is perfect here):
SELECT ordh.ord_date,
ordd.item_id,ordd.qty, ph.price, ph.price * ordd.qty as price_extendedFROM ORDERS ordh
JOIN ORDERS_DETAIL ordd ON ordh.ord_num = ordd.ord_num JOIN PRICE_HISTORY ph ON ordd.item = ph.item
AND ordh.ord_date >= ph.begin_date AND (ordh.ord_date <= ph.end_date OR ph.end_date IS NULL)
From here my own non-negotiable design requirement is that people such as customer IT staff should never have to code the above, they should just be able to "SELECT price_extended " from somewhere, so the above can be put into a view and so far so good.
Well, hmmmmmm, we (read: you) seem to worked the time dimension in satisfactorily, we now have to ask what could screw this up? Well, complexity and deeply nested calculations will perform poorly at read, while materializing them will delay writes, so since we want fast writes (and small transactions) that's another nail in the coffin.
Another objection is that this approach will lead to a proliferation of tables, but that is not a bad thing, that is a good thing. One must have adequate tools for managing the creation and maintenance of tables.
Well, shoot, we also have the fact that a price history table is, in and of itself, a good thing.
Well, I'm just going to have to think about this...
-- Kenneth Downs Use first initial plus last name at last name plus literal "fam.net" to email meReceived on Sat Oct 09 2004 - 21:58:42 CEST