Re: O'Reilly interview with Date

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 9 Aug 2005 08:49:07 -0700
Message-ID: <1123602547.655699.156650_at_z14g2000cwz.googlegroups.com>


erk wrote:

>

> How can "data" not be structured in some way? Given your comments, I
> have no clue what "semi-structured" actually means - although I
> probably didn't before either.

Just a quick note: a now believe that the term "semi-structured" does actually have a legitimate use, although I will grant that most usages of the term one sees are not using the term in any formal way.

"Semi-structured" data refers to data in a context where some of the structure of the data is visible to the application, and some is not.

For example, if you have a network packet with a header and a payload, you might have an application that knows the schema of the header but not the payload, and uses data in the header to choose an encryption method for the payload. The application treats the payload as an opaque stream of bytes, hence it is "unstructured" to that application. Of course, the structure is still there, and some other application down the line will know what the schema for it is. Clearly, if *no* application knew the schema, it would just be noise and not data.

I was surprised to discover that this term actually can be useful. Knowing a useful definition of the word also makes it clearer when one encounters a non-useful use of the word. Lots of low-end XML people use the term to mean, "I don't know what schemas are for."

Marshall Received on Tue Aug 09 2005 - 17:49:07 CEST

Original text of this message