Re: Normalize until neat, Automate until Complete

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Wed, 24 Nov 2004 14:42:35 -0500
Message-ID: <cq4f72-j63.ln1_at_pluto.downsfam.net>


Alan wrote:

<snip>

> 
> "Kenneth Downs" <firstinit.lastname_at_lastnameplusfam.net> wrote in message
> news:o5ve72-r03.ln1_at_pluto.downsfam.net...

>>
>> I would suggest a better slogan would be "Normalize until Neat, Automate
>> until Complete," which actually gives some guidance to the
>> denormalization
>> process. The idea is that user-entered or imported data should be
>> normalized for standard reason, nothing new there. The process of
>> automation requires a normalized database as a starting point, so that
>> the generated columns are built upon a valid base.
>>

<snip>

> 
> I certainly agree that normalizing first gives you the proper base to
> denormalize from. What do you mean by, "automated columns"?

Given any Table X and any Table Y, table automation is any action that causes a new row in Table Y to be created as a result of a write to Table X. I have a few of these working, but am not yet comfortable discussing the systematic approach.

There are these kinds of column automation, in the order that I find them useful:

  1. Fetch. Retrieval of a column from parent table to child table. Triggered by insert/update of foreign key in child table. e.g., items.price -> order_details.price.
  2. Extend. Calculation of a column based on other columns within the row. e.g., extended_price = price * qty.
  3. Aggregate. Any of sum, count, min, max, or avg of a child column. In reality, sum and count are easy to implement safely, avg can be done by composition, min and max present the most trouble at implementation time, so much so that right now I am staying away from them.
  4. Distribution. Copy of a column from parent table to child table. Distinguished from a fetch because it is triggered by a change in the parent column. I actually do not implement this one, as together with Aggregates it can produce update loops, and that is where the trouble *really* starts once you get into the realities of implementation.

What I was alluding to in the OP is that these definitions provide a model for denormalization, the beginnings perhaps of an actual useful theory, instead of merely saying, 'Well shoot, I normalized but I'm finding it unwieldy, maybe I should do something.' These definitions provide me with the framework for deciding what that 'something' is.

From here I hope it is clear that if your system does not allow the automated columns to be updated by UPDATE and INSERT statements, then you have made a huge leap while still protecting data.

This is the real clincher, that you can denormalize if you have a rational method for doing so, and if you can protect against anomalies that normalization protects against. My rationale is:

  1. Normalize user-entered data
  2. Provide the general constraint that automated columns cannot be modified by users in INSERT/UPDATE statements
  3. Provide a systematic framework for extending the normalized database, I contend that this framework is an *automation* framework, and that the definitions above allow you to move forward. -- Kenneth Downs <?php $sig_block="Variable scope? What's that?";?>
Received on Wed Nov 24 2004 - 20:42:35 CET

Original text of this message