Re: Two examples of semi structured data.

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sat, 21 Aug 2004 10:59:03 +0200
Message-ID: <41270edb$0$21106$e4fe514c_at_news.xs4all.nl>


Jan Hidders wrote:

> mAsterdam wrote:
> 

>>Jan Hidders wrote:
>>
>>>The common theme is here that you would like to be able to query and
>>>manipulate data that although it has structure in some way this structure
>>>is not (or only partially) made explicit.
>>>
>>>A nice old paper (IMO) that discusses the main issues can be found here:
>>>
>>>http://citeseer.ist.psu.edu/cache/papers/cs/558/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSzicdt97.semistructured.pdf/abiteboul97querying.pdf
>>
>>To get the goat, the wolve and the cabbage across one has to go back
>>and forth - there is no escape in that. My impression of this paper:
>>they sacrifice the goat (and maybe the cabbage if the wolve is
>>smart) by putting all three of them in one boat.
>
> Could you be a bit more specific, Danny?

I'll try. Please keep in mind though, that it is still just that: an impression. It is about taste and sense of direction, I am _not_ arguing the validity nor the _general_ relevance of their (Serge Abiteboul and his coreferents) reasoning.

So first a thing or two about my
tastes/prejudice/opinion/whatever.

You may recall from the recent thread
"It don't mean a thing if it ain't got ..." how I think about the widespread but futile thinking of data as potentially meaningless.

Datacapture is tough. It is one of the most essential steps in getting from text to information. It compares nicely to the capture step in audio-visual production - think camera's, microphones, synthesizers, filters, signal levels, delays and recording.

My opinion: (Data)capture is widely underestimated in value and complexity.

Now to the document, "Querying Semi-Structured Data". When I read texts like: "Some of this data is C<raw> data, e.g., images or sound." I infer that the author talks about potentially meaningless C<signs>, not about C<data>.

I don't have to wait very long to verify that the damage of this non-choice is done. "We call here C<semi-structured data> this data that is (from a particular viewpoint) neither raw nor strictly typed, i.e. not table-oriented as in a relational model or sorted-graph as in object databases." Well (please keep in mind I am making statements of taste, I am *not* refuting the author's argument): by lumping together sorted-graphs and tables in one category "strictly typed" suddenly all structure _inherent_ in the data is out of focus.

The rest of the document will have no real relevance to my thinking on the matter. They *will* touch (but never delve into) several topics that are - so maybe it is has some relevance as an inventory of topics.

"To completely structure the data often remains an elusive goal" I am so out of here - where is the door!

I hope this is somewhat clearer to you than my earlier comment.

BTW Jan, my name is no secret, but in trying to keep it out of IT-public places I use mAsterdam as a pseudonym. I would appreciate it if you would, too. No harm done, though. Received on Sat Aug 21 2004 - 10:59:03 CEST

Original text of this message