Re: Normalization and Derived Information

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Fri, 24 Sep 2004 17:30:51 -0400
Message-ID: <cm32jc.44i.ln_at_mercury.downsfam.net>


--CELKO-- wrote:

>>> A trigger can compute the column and then it is always in sync. <<

>
> But then the computed value takes up space and it is faster to compute
> it in main storage than to read it off the disk.

OK, as far as speed to read, I suspect you are wrong in many cases. Note the word "suspect", I'm not about to run a bunch of tests, but to wit:

  1. In very small reads, the disk sector reads may actually be the same in either case.
  2. In the next larger set of reads, dozens to thousands of rows, the amount of time to read a few more sectors is still << T(h), where T(h) is the Time humans can actually measure or detect.
  3. In the next larger set of reads, which accounts for a good chunk of real-world data access, you perform some summary report for a middle manager. Here you may be right by some measurable few percent, but if I may be so blunt, who cares?
  4. In the next larger set of reads, you pull 10,000 rows to print labels, and here the bottleneck is the printer, so again the extra time is << T(h).
  5. Finally, beyond the simple examples come the cases where we now have to join to other tables to get components of a calculations, and now there are *more* reads for a calculate than a read.

In my experience your statement is true in practical application only when you actually start moving megabytes of data. And even then we always materialized the data so that the reads would be no-brainers for anybody required to cut tapes or send out data.

But beyond that, since we are arguing practical concerns (instead of whether or not we've violated some theoretical argument), the existence of the data makes it possible for *anyone* (authorized) to read it, while the requirement to calculate puts the customer at the mercy of the front-end that is pulling and retrieving. I have seen so many horrific situations caused by this that I consider them a proof by contrapositive.

> Also, triggers are
> proprietary inspite of the existence of standards for them.

True, but I don't code triggers by hand I generate them according to patterns. This is "admissible" as it were since we are discussing practicalities. The code to assign values to variables and then write them to a row is subtly different in say MS SQL and DB/2, but

>
> Let me some more things to the history table while I am here:
>
> CREATE TABLE PriceHistory
> (upc CHAR(9) NOT NULL,
> price DECIMAL (12,4) NOT NULL,
> start_date DATE NOT NULL,
> end_date DATE, -- null means current
> CONSTRAINT history_is_sorted
> CHECK (start_date <= end_date)
> CONSTRAINT periods_are_non_overlapping
> CHECK (NOT EXISTS
> (SELECT *
> FROM PriceHistory AS P1
> WHERE PriceHistory.upc = P1.upc
> AND PriceHistory.startdate <> P1.startdate
> AND PriceHistory.startdate BETWEEN P1.startdate AND
> P1.enddate)),
> CONSTRAINT one_current_price_per_product
> CHECK (NOT EXISTS
> (SELECT upc
> FROM PriceHistory AS P1
> GROUP BY upc
> HAVING COUNT(startdate) <> COUNT(enddate) -1),
> ..);
>
> The table level constraints are legal in Standard SQL, but not
> implement in most products yet.

The use of the history table requires me to read that table in order to get price, so now I'm doing a join and reading another table, instead of a simple pull from the order detail table.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Fri Sep 24 2004 - 23:30:51 CEST

Original text of this message