Re: Storing derived and derivable data

From: dawn <dawnwolthuis_at_gmail.com>
Date: 5 May 2006 05:12:19 -0700
Message-ID: <1146831139.454696.295070_at_i39g2000cwa.googlegroups.com>


Kenneth Downs wrote:
> dawn wrote:
> > Is there database theory that includes identification of
> >
> > 1. the fact that values for an attribute either were or could have been
> > derived?
> > 2. how values for an attribute were derived?
> > 3. how values for an attribute could have been derived?
> >
>
> Dawn, hello! Long time no see.

Welcome back. Seems like you were out for the academic calendar.

> I saw your post and decided it was time to end my long exile, you have
> asked the questions that are at the heart of my company and project.
>
> In order to even begin to deal with this question, you have to deal with
> the 800lb gorilla called "Normalization", which forbids derived data.

Interesting perspective. I think of SQL Views as permitting derived data, including a means to include an attribute from a function that accesses a stored procedure, for example.

> I
> wrote an essay on that some months ago, which is here:
>
> http://docs.secdat.com/index.php?gppn=Normalization+and+Automation
>
> The states basically that derived data is part of life, and since
> normalization can't deal with it, normalization is not enough to guide
> serious application development. But you also can't throw it away,
> because it gives good things.

Great stuff! I like the name you (or others?) gave this concept. There is a distinction between derived columns/attributes and what you called automated columns. Very good. That is what I'm asking about -- stored derived data, which I will now call "automated" (even if I call them attributes instead of columns) instead of "stored derived" based on

http://docs.secdat.com/index.php?gppn=Review+of+Automations

I like most of what you wrote in your normalization + automation column, with only minor quibbles about the normalization issue. I, too, believe that we should treat our metadata, including business rules, as data. Those writing database-independent software and those writing IDEs typically do this, each reinventing the wheel, of course. Obviously our theory needs to include derived data, but my read on the RM is that it does, just not "in base tables." So, it separates "non-derived" and derived along set lines, rather than along attribute lines. That is a mistake, in my opinion.

Anecdote: In an environment where derived data can be added to a relation, a SQL-trained data modeler wanted to normalize the derived data into the mix, at which point I had a little go-round on that with the upshot that now the derived data are not related to the key, the whole key, and nothing but the key. There is no need to normalize derived data. I think that might be one of your points here too.

I do know that asking some relational modeling folks how to handle atomated columns (yup, that term works great!) puts them way outside their comfort zone where there should never be any redundancy in data. Of course it is that kind of thinking that prompted people to take the whole bloomin' database and port it to another one, reshaping it, deriving stuff, etc, for reporting purposes. Talk about redundancy!

Cheers! --dawn Received on Fri May 05 2006 - 14:12:19 CEST

Original text of this message