Re: Two examples of semi structured data.

From: Jan Hidders <jan.hidders_at_REMOVETHIS.pandora.be>
Date: Sun, 22 Aug 2004 09:42:03 GMT
Message-ID: <pan.2004.08.22.09.43.56.603890_at_REMOVETHIS.pandora.be>


On Sun, 22 Aug 2004 03:43:37 +0200, mAsterdam wrote:

> Jan Hidders wrote:

>> mAsterdam wrote:
>>>
>>>Now to the document, "Querying Semi-Structured Data".
>>>When I read texts like: "Some of this data is C<raw>
>>>data, e.g., images or sound." I infer that the author
>>>talks about potentially meaningless C<signs>, not about
>>>C<data>.

>>
>> Er, no. Note the "(from a particular viewpoint)" phrase, which is crucial
>> here. Serge argues that sometimes, from a particular viewpoint of for
>> example a certain type of user or a certain application, you are really
>> not interested in any structure that may or may not be hidden in a
>> particular stream of bits; they are just a stream of bits and that's
>> it.
>
> That's it. No data, right? Signs.

Yes. Or as it is usually taught in academia: "No information, just data."

>> That does certainly not exclude the possibility that from *another*
>> point of view there certainly is some worthwile structure to be
>> discovered there. Think of for example the payload in a package in some
>> communication protocol.

> 
> Signs and packages of signs. Structure yes, data no.
> 

>> At one level of the protocol stack this is just a
>> list of bits, at another level the same list of bits may be a certain
>> service request with some parameters. Whether data is "raw" or not is in
>> the eye of the beholder, it is not an objective quality of the thing
>> itself.
> 
> Put differently: we can store stuff, move it around from
> one place to another, use structures for that - but
> without interpretation ("the eye of the beholder") there
> is no message, no meaning, no communication.

Yes. Is that a problem? Mostly this is just a matter of definition and therefore meaningless.  

>>>I don't have to wait very long to verify that the damage of
>>>this non-choice is done. "We call here C<semi-structured
>>>data> this data that is (from a particular viewpoint) neither
>>>raw nor strictly typed, i.e. not table-oriented as in a relational
>>>model or sorted-graph as in object databases." Well (please
>>>keep in mind I am making statements of taste, I am *not* refuting
>>>the author's argument): by lumping together sorted-graphs and tables
>>>in one category "strictly typed" suddenly all structure _inherent_
>>>in the data is out of focus.

>>
>> Where do you get that idea? The only thing that is said is that if that
>> data is typed, which basically means that we know its structure, and if
> 
> ... Sorry to interrupt. The only structure we now know is structure
> imposed on the signs to be stored or forwarded or represented. This 
> structure does not determine meaning, neither is it determined by 
> meaning. Buzzword bingoish: it is orthogonal to meaning.

No, it is not orthogonal because it can be, and usually is, the carrier of meaning. I could send a simple string with flat text or I could add structure in the form of XML mark-up and then send it to you. If we have agreed before on what this markup means then the added structure will add additional meaning.

>> that is all the structure we are interested in, then it is not considered
>> semi-structured. On the other hand, if it is completely untyped and
>> without structure, but we are also not interested (from the chosen
>> perspective!!) in any hidden structure, then it is also not considered to
>> be semistructured.

> 
> Re-introducing meaning after dissmissing it at the
> start is troublesome: How (and why) does being
> interested suddenly come into play? How does it
> relate to the discussed (semi-)structeredness?

You seem to have a problem with the fact that the meaning of data is not an objective property of said data but also depends on those that deal with that data. Is that correct? Why does that bother you so much?

>>>"To completely structure the data often remains an elusive goal"
>>>I am so out of here - where is the door!

>>
>> Don't you think you're overreacting a little? The only thing that is said
>> is that it is probably not always possible to retrieve all the hidden
>> structure we are interested in. Nowhere is it said by Serge that we
>> shouldn't try, or even that we shouldn't try very hard. On the contrary.
>> And in fact, right now, there is a lot of research being done on that.
> 
> There is a multitude of entangled structures, present
> and imposed. The goal is to unravel, unveil them in
> order to satisfy our curiosity and maybe even do
> something useful with our findings, building new things
> - using existing and new structures of course -
> along the way. "To completely structure" as a goal
> (albeit elusive) is a sign of seriously
> overestimating our capability to understand.

Yes. I don't see the problem here. The fact that we may not be able to reach the borders of our universe doesn't imply in any way that we shouldn't do space exploration. It's almost as if you have a deep psychological need to have everything structured. -- Jan Hidders Received on Sun Aug 22 2004 - 11:42:03 CEST

Original text of this message