Re: Mapping arbitrary number of attributes to DB

From: David Cressey <dcressey_at_verizon.net>
Date: Tue, 24 Oct 2006 11:34:59 GMT
Message-ID: <DTm%g.7058$ke4.4635_at_trndny02>


"Joshua J. Kugler" <joshua_at_eeinternet.com> wrote in message news:453d522f$0$19658$88260bb3_at_free.teranews.com...
> David Portas wrote:
>
> >> Thus, each
> >> station can have a number of attributes that vary from one to dozens.
I
> >> can't tell the scientists "You may have 32 sensors per station, no
more,
> >> no
> >> less." That just won't work, for reasons I hope are obvious. :)
> >
> > No-one except you suggested that.
>
> Sorry, I didn't mean to suggest that anyone did...just thinking out-loud I
> guess.
>
> > So you'll need attributes for Wind Speed, Temperature, Humidity etc.
> > Obviously those are very different domains so they belong in separate
> > columns. If each type of reading is totally independent of the others
> > then one would expect to see a separate table for each, with a common
> > key presumably comprising the station number and a timestamp. This is
> > of course totally unsupported guesswork on my part.
>
> What you suggest is good, and might work if I had a fixed domain and set
of
> sensors. But the set of sensors, and types, could change at any time. If
> it were feasible to take that approach (currently, it's not), we could go
> into production, and two months down the road, we would need to import
data
> for sensor type Zerple, that does not fit in any of our previous profiles,
> and we might have 20 different readings of type Zerple on each sample
(line
> of the data file). Creating a new table for each type of new sensor is
not
> going to work.
>
> I admit, I'm working with a problem that is *very* database unfriendly. I
> guess I'll see what I can read and come up with.
>
> Thanks for your input so far.

PMFJI. Now that I see what you are up against, I have a better handle on why you are perplexed.

If I can generalize a little bit from what you've written here, you want a data model that will be impervious to changing information requirements. In essence, the introduction of a new, and hitherto not conceived type of sensor introduces new data types into the universe of discourse, and possibly new entites as well.

The problem of coming up with a model so flexible that it can automatically adapt to any possible change in the universe of discourse is one that has vexed practitioners of relational systems for at least 36 years.

I think the best you are going to be able to do is to come up with some kind of "meta model", a model that models models. I see your proposed solution as the first step in this direction. It's highly flexible, but almost useless. The next step, as I see it,
is to come up with a good model of the CURRENT state of the subject matter, given the CURRENT information supplied by sensors.

This should be a straightforward exercise in data analysis and database design, even if it takes a while. I'll call this the normalized database.

After that you need to come up with processes (programs) that will transform input data in the form you poposed into the form I proposed. You will have to make your programs work on incremental inputs, most likely, in order to keep up.

Then you need to make metadata updates to the normalized database as simple, easy, and automatic as you can. You will need to make metadata updates whenever a new kind of sensor is introduced.

After that, you might want to build a reporting database along some kind of OLAP principles that allow your data analysts to adapt more or less automatically tothe changing world the sensors depict. ETL between the normalized database and the reporting database will be a real bear, because of the ongoing metadata updates to the normalized database.

How many years do you plan on working before retirement? ;-) Received on Tue Oct 24 2006 - 13:34:59 CEST

Original text of this message