Re: Storing derived and derivable data

From: Kenneth Downs <knode.wants.this_at_see.sigblock>
Date: Sat, 06 May 2006 10:15:49 -0400
Message-Id: <2mnui3-tbq.ln1_at_pluto.downsfam.net>


dawn wrote:

>
> agreed
>

>> Second, they cannot create new rows.

>
> Hmm. I guess I do think about derived data as columns rather than
> rows. Other than aggregate rows, I'm having trouble coming up with
> derived rows -- can you give an example?

You know its funny you say this. To make a long story short, the only automated creation of rows I have ever used is the materialization of aggregates.

This is done in Andromeda by what I call the auto-insert feature of a foreign key. When a row is inserted into a child table that would fail RI validation to parent table X, it actually inserts a row into table X to satisfy the RI.

I say this is funny because this totally surprised me. I had assumed for no good reason that I would need to arbitrarily insert rows into any Table X based on any event in Table Y. But as I was working through real-world cases I discovered that if there was no key between X and Y, it was fiendishly difficult to verify the data was cascading correctly. So I began in practice to cascade only between related tables. Of course I realized eventually that the auto-insert foreign key could be used in all cases, and then it dawned on me that this was no coincidence. A definition of an automated write from one table to another could not make sense unless the data could be demonstrated correct after the fact, and the way we relate rows between tables is called a foreign key.

>

>> Another drawback is more of a balance question.  Materializing
>> derivations within the transaction lengthens the transaction,

>
> Yes.
>
>> takes more space,

>
> a whole lot less than if you port a lot of your data elsewhere for
> reporting
>

Ha! this is the second time you've mentioned this, the point is well taken.

>> and
>> increases in some cases the chances of contention.  But then the reads
>> are
>> fast.  With a view, the transactions are kept shorter, at the price of an
>> ever-increasing read cost as the complexity increases.
>>
>> Finally, SQL itself is a piss-poor language in which to store the
>> authoritative fundamental definition of derived data,

>
> I'm sure that's true. I did work on a project to define the language
> used to specify derived data before, but only at the very start where
> we could argue the approaches. Too much fun.

I don't know if you saw my second post to your original question, but I have a language defined and it works well. My largest project is about 250 heavily automated tables described in a data file about 9000 lines long.

Success was elusive until I followed this track:

  1. describe informally what we need to know about tables, including automation. I need to know types, keys, formulas....
  2. design the tables to record what we need to know about tables, IOW, define the data dictionary
  3. come up with a file format that can be easily typed and read, and can also be easily parsed and loaded to tables for processing.

From a theoretical point it is important to stress that statements in a data dictionary language are propositions, which, by Codd, makes them data. The path to getting a working system is to concentrate first on what data you are trying to record, then coming up with a syntax for it.

>

>> so the view statement
>> itself can't be your method of definition.  You need a better language in
>> which to define the variations, and then you decide whether to use a view
>> as an implementation method.

>
> Of course I don't use SQL views to implement anythiing (when not using
> a SQL DBMS)

LOL
>

>> > Anecdote: In an environment where derived data can be added to a
>> > relation, a SQL-trained data modeler wanted to normalize the derived
>> > data into the mix, at which point I had a little go-round on that with
>> > the upshot that now the derived data are not related to the key, the
>> > whole key, and nothing but the key.  There is no need to normalize
>> > derived data.  I think that might be one of your points here too.
>>
>> I don't understand what each of you were proposing.  How did he want to
>> normalize the derived data?

>
> The specific case I recall is a "file" with a composite key, each of
> which being a dimension that one might use for slicing and dicing. The
> file didn't have much base data in this file that would be a
> relationship file in an ERD among the three parts of the key. I wanted
> to put derived data related to just one part of the key (each part) in
> the file. This turned the file into a virtual cube. The DBA said it
> was not normalized with virtual fields that were not related to the
> whole key. (He was a she, by the way)

Yeah, a shame. She was correct, but wrong.

-- 
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth_at_(Sec)ure(Dat)a(.com)
Received on Sat May 06 2006 - 16:15:49 CEST

Original text of this message