Re: Two examples of semi structured data.

From: Dawn M. Wolthuis <dwolt_at_tincat-group.comREMOVE>
Date: Sat, 21 Aug 2004 08:58:01 -0500
Message-ID: <cg7kdc$qjp$1_at_news.netins.net>


"Gene Wirchenko" <genew_at_mail.ocis.net> wrote in message news:paadi0h8cadi42vtra3q2hmflvdi3601v4_at_4ax.com...
> mAsterdam <mAsterdam_at_vrijdag.org> wrote:
>
> >Gene Wirchenko wrote:
> >
> >> "Laconic2" <laconic2_at_comcast.net> wrote:
>
> >>>I'd like to revive the discussion of semi structured data that was held
a
> >>>few months ago. This time, I'd like to use two examples of "semi
> >>>structured" data as a starting place.
> >>>
> >>>The two examples are the MS Windows Registry, and cookies.
> >>>
> >>>Why do I call them semi-structured? Well, it's clear to me that each
of
> >>>them adheres to some structure that is a little more rigid than "plain
> >>>text". It's also clear to me that control over coherence of the
content is
> >>>not unified.
> >>
> >> They are structured. They are not as structured as you would
> >> like.
> >
> >What is their structure?
>
> RTFM for each case. As Laconic2 said: "Well, it's clear to me
> that each of them adheres to some structure...". If nothing else,
> they are series of octets.

I don't have a good understanding of the registry or cookies so that I don't know if/where there are any foreign keys, for example, so that they look to me like indexed flat files (again, without looking too deep). But to build on what Gene said, the data types for various fields seem to be very broad (loose typing). If you back the type up far enough (for example, make everything a string type) then you don't have as many constraints from the "database engine". There do seem to be fixed fields (not fixed length) even if they are redefined in different records. Although there might be functions to parse values for some function or other, there are not functions that attempt to "understand" what the human was trying to do in their native language -- their is computer data precision in the values that are interpreted as typical values are from any database (if this or that or the other thing is in this field then do this).

The folks talking about semi-structured these days are often referring to documents where the typing is loose (e.g. strings) but then there are a lot of functions that work on that text to do things like find documents with a specified set of words found within a few sentences of each other (thus requiring a function to find a sentence).

But I'm off my turf on this one, but liked the question and I certainly could be way off base in this response. Cheers! --dawn

> Sincerely,
>
> Gene Wirchenko
Received on Sat Aug 21 2004 - 15:58:01 CEST

Original text of this message