Re: Two examples of semi structured data.

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sun, 22 Aug 2004 03:43:37 +0200
Message-ID: <4127fa4e$0$21106$e4fe514c_at_news.xs4all.nl>


Jan Hidders wrote:
> mAsterdam wrote:

>>Jan Hidders wrote:
>>>mAsterdam wrote:
>>>>Jan Hidders wrote:
>>>>
>>>>>The common theme is here that you would like to be able to query and
>>>>>manipulate data that although it has structure in some way this structure
>>>>>is not (or only partially) made explicit.
>>>>>
>>>>>A nice old paper (IMO) that discusses the main issues can be found here:
>>>>>
>>>>>http://citeseer.ist.psu.edu/cache/papers/cs/558/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSzicdt97.semistructured.pdf/abiteboul97querying.pdf
>>>>
>>>>To get the goat, the wolve and the cabbage across one has to go back 
>>>>and forth - there is no escape in that. My impression of this paper: 
>>>>they sacrifice the goat (and maybe the cabbage if the wolve is 
>>>>smart) by putting all three of them in one boat.
>>>
>>>Could you be a bit more specific, Danny? 
>>
>>I'll try. Please keep in mind though, that it
>>is still just that: an impression. It is about
>>taste and sense of direction, I am _not_ arguing
>>the validity nor the _general_ relevance of their
>>(Serge Abiteboul and his coreferents) reasoning.

>
> Well, "de gustibus no disputandum est" so I'll concentrate on the
> "direction" part.
>
>>So first a thing or two about my tastes/prejudice/opinion/whatever.
>>
>>You may recall from the recent thread
>>"It don't mean a thing if it ain't got ..."
>>how I think about the widespread but futile
>>thinking of data as potentially meaningless.

>
> I have to admit I have missed that thread. I have a busy day-time job that
> regularly also eats up my night-time so I have to be selective. But I
> suspect the following sentence sums it up, right?

I see why you suspected that, but no.
A highly biased summary: There is a basic difference, often overlooked yet crucial to current misunderstandings, between the _meaning_ and the _use_ of data. A link (to compensate for the bias):
http://groups.google.nl/groups?hl=nl&lr=&ie=UTF-8&threadm=40bcf91b%240%2434762%24e4fe514c%40news.xs4all.nl&rnum=1&prev=/groups%3Fq%3Dmean%2Ba%2Bthing%2Bgroup:comp.databases.theory%26hl%3Dnl%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.databases.theory%26selm%3D40bcf91b%25240%252434762%2524e4fe514c%2540news.xs4all.nl%26rnum%3D1 You might like the Peano - example.

>>My opinion: (Data)capture is widely underestimated
>>in value and complexity.
>>
>>Now to the document, "Querying Semi-Structured Data".
>>When I read texts like: "Some of this data is C<raw>
>>data, e.g., images or sound." I infer that the author
>>talks about potentially meaningless C<signs>, not about
>>C<data>.

>
> Er, no. Note the "(from a particular viewpoint)" phrase, which is crucial
> here. Serge argues that sometimes, from a particular viewpoint of for
> example a certain type of user or a certain application, you are really
> not interested in any structure that may or may not be hidden in a
> particular stream of bits; they are just a stream of bits and that's
> it.

That's it. No data, right? Signs.

> That does certainly not exclude the possibility that from *another*
> point of view there certainly is some worthwile structure to be
> discovered there. Think of for example the payload in a package in some
> communication protocol.

Signs and packages of signs. Structure yes, data no.

> At one level of the protocol stack this is just a
> list of bits, at another level the same list of bits may be a certain
> service request with some parameters. Whether data is "raw" or not is in
> the eye of the beholder, it is not an objective quality of the thing
> itself.

Put differently: we can store stuff, move it around from one place to another, use structures for that - but without interpretation ("the eye of the beholder") there is no message, no meaning, no communication.

>>I don't have to wait very long to verify that the damage of
>>this non-choice is done. "We call here C<semi-structured
>>data> this data that is (from a particular viewpoint) neither
>>raw nor strictly typed, i.e. not table-oriented as in a relational
>>model or sorted-graph as in object databases." Well (please
>>keep in mind I am making statements of taste, I am *not* refuting
>>the author's argument): by lumping together sorted-graphs and tables
>>in one category "strictly typed" suddenly all structure _inherent_
>>in the data is out of focus.

>
> Where do you get that idea? The only thing that is said is that if that
> data is typed, which basically means that we know its structure, and if

... Sorry to interrupt. The only structure we now know is structure imposed on the signs to be stored or forwarded or represented. This structure does not determine meaning, neither is it determined by meaning. Buzzword bingoish: it is orthogonal to meaning.

Date: January 5, 1887
We have a type, we have a structure to store, forward and represent signs. What is missing?

> that is all the structure we are interested in, then it is not considered
> semi-structured. On the other hand, if it is completely untyped and
> without structure, but we are also not interested (from the chosen
> perspective!!) in any hidden structure, then it is also not considered to
> be semistructured.

Re-introducing meaning after dissmissing it at the start is troublesome: How (and why) does being interested suddenly come into play? How does it relate to the discussed (semi-)structeredness?

>>The rest of the document will have no real relevance to my
>>thinking on the matter. They *will* touch (but never delve
>>into) several topics that are - so maybe it is has some
>>relevance as an inventory of topics.

>
> That is how it is meant. It is more like a research agenda for problems
> that he thinks are interesting and look like researching them (by database
> theoreticians) might lead to something in the direction of a solution.
>
>>"To completely structure the data often remains an elusive goal"
>>I am so out of here - where is the door!

>
> Don't you think you're overreacting a little? The only thing that is said
> is that it is probably not always possible to retrieve all the hidden
> structure we are interested in. Nowhere is it said by Serge that we
> shouldn't try, or even that we shouldn't try very hard. On the contrary.
> And in fact, right now, there is a lot of research being done on that.

There is a multitude of entangled structures, present and imposed. The goal is to unravel, unveil them in order to satisfy our curiosity and maybe even do something useful with our findings, building new things - using existing and new structures of course - along the way. "To completely structure" as a goal (albeit elusive) is a sign of seriously
overestimating our capability to understand.

>>BTW Jan, my name is no secret, but in trying to keep it out of
>>IT-public places I use mAsterdam as a pseudonym.
>>I would appreciate it if you would, too. No harm done, though.

>
> Ok. I disagree (IMO there are some very good arguments why everybody
> should always sign with their real name unless there is an important
> reason not to) but I apologize and will respect your wishes.

Thank you for that (and really really no apology necessary). I am sure you already know, but just to make sure other readers know as well: I very much value your insights and I enjoy your contributions to this newgroup. I am allmost sorry I cannot agree with you on this topic :-) Received on Sun Aug 22 2004 - 03:43:37 CEST

Original text of this message