Re: Resiliency To New Data Requirements

From: JOG <jog_at_cs.nott.ac.uk>
Date: 17 Aug 2006 02:50:25 -0700
Message-ID: <1155808225.557316.217500_at_74g2000cwt.googlegroups.com>


dawn wrote:
> JOG wrote:
> > dawn wrote:
> > > Perhaps there is a theory definition of the word structure that you are
> > > using to draw the conclusion that the web does not have structured
> > > data. My take is that a single page can be a node/attribute with a
> > > value that is the html, for example, with directional paths to other
> > > nodes for which there are links in the page. Structure, no?
> >
> > No, Marshall is correct. Information such as that on the web is known
> > in the scientific literature as Unstructured data. That which comes
> > between that and relationally represented data is known as
> > Semi-structured data.
>
> Yes, I've read those terms. I do understand the use of "unstructured"
> within an attribute value, such as an attribute whose value is a
> document. That doesn't make the whole unstructured. The fact that a
> database holds music doesn't make the database any less structured.
> There can be a structured database that includes unstructured attribute
> values.
>
> R(URL, html, foreignKeyList)
>
> That's some structure, right? For the subset of the web with xhtml
> backed by a schema, we would perhaps be able to show more structure.
>
> > Definitions are woefully slapdash, but Google
> > scholar will supply a whole host of papers on the subject.
>
> Yup, and those papers can define unstructured and semi-structured
> however they want. I'm just saying that the data is also structured.
> You could put the data into a relational database. I could put it into
> a PIck database. Right now it is in a highly distributed database that
> has a structure, even if there is "unstructured" data within it.
>

All statements are 'structured' otherwise we wouldn't understand them. Hence using the term in that sense is of no use. Similarly all data can be put into a relational database, but that hardly makes it a database before its there.

> Maybe I'm misunderstanding the use of these terms, but I reallly,
> really dislike the term "semi-structured" for data that has a
> structure. --dawn

It doesn't matter if you dislike them. I don't particularly like the fact that Codd's 'relationships' reverted back to 'relations' , as the former indicates the insight he added to standard mathematical relations, but I'm not going to walk around calling RM a 'Relationship'al Model now am I.

Structured/Semi-structured/Unstructured terminology has been standard for 15 years. Even before that in 1979 Codd himself made exactly the same distinctions, except he used the terms formatted and unformatted, for structured and unstructured respectively. He just never cemented the phrases. Received on Thu Aug 17 2006 - 11:50:25 CEST

Original text of this message