Massively distributed data

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Tue, 11 May 2004 22:33:22 -0500
Message-ID: <c7s5um$7nk$1_at_news.netins.net>



I think this is a fascinating question and maybe some of you will agree and will have suggestions. I sat next to Jim Waldo from Sun at a lunch at a Jini Community Conference in Boston earlier this year. He was talking about medical information coming directly from people in some way. The idea would be to have information about the health of an individual come from their body. This was still in the stage of formulating the problem statement, so the rest is just related to my own reflections on the problem.

Possible simple scenario:
People have the option of wearing a patch wherein is recorded an ID (for example, a US Social Security Number) and it is able to capture the person's temperature. Now, let's just grant that somehow this information can be communicated, in a secure fashion, to some health service to which this person subscribes. Similar to a home security network, this person can be alerted by their health organization when their temperature goes from a green to a yellow or red zone. Perhaps it is another service with which this person subscribes for saliva samples, which are not continuously present, but for whicih the sample is taken automatically whenever the person brushes their teeth.

My not-necessarily-brilliant thinking is that each such attribute (variable) for each entity (person) could be registered as/with a (software; web) service. Instead of grabbing data into a central location, this highly distributed "data base" would not typically be viewed as sets of values, but with each value as a node in this graph of data.

It seems to me that without moving hoards of data around, refashioning this tree/graph into relations, the live data (the word "live" takes on new meaning!) could be researched by a service that discovers the data it is looking for and reports against them. Health-o-meters (services) could crawl through this live data without rehosting it (potentially for all the people in the world).

This massively distributed database might then require (or at least be well-served by) a separate "data source" for each value in the database. Which brings me to my question about new ways of specifying data sources. If nothing is coming down the pike for this, then I'll just have to come up with it myself, eh?

This seems to me to be an example of where a graph of data just makes a whole lot more sense than applying only relational operators, but I'm certain I have not thought through every possible approach to such a problem. Does anyone have any ideas on what database theory would be relevant to this not-highly-formulated problem statement for a massively distributed database?

--dawn Received on Wed May 12 2004 - 05:33:22 CEST

Original text of this message