Re: Resiliency To New Data Requirements

From: erk <eric.kaun_at_gmail.com>
Date: 18 Aug 2006 04:06:00 -0700
Message-ID: <1155899160.458053.122320_at_b28g2000cwb.googlegroups.com>


dawn wrote:
> erk wrote:
> > I don't think so. It's the endless "what vs. how" debate - it goes all
> > the way down and all the way up. Which is which depends on your point
> > of view.

I was sloppy here - "context" is much better than "point of view," although in this case I specifically meant abstraction context.

> Similarly, I would guess that if one were to look at name-value pairs,
> csv files, relational tables, and Pick files, one could say they are
> all structured data in that they could all be used similarly, even if
> they are different structures.

While it's obvious which of these is best (ahem), they're all superior to XML, which requires a mangling of the concept of "datum" or "document" to act as a "structure manager" (or "container," if you prefer - ah, maybe I'll use the term "aggregate"). It requires something else; as such, it's a type.

> Toss in a graphic of a monkey or a pair
> of shoes and you might suggest that even if structured, these are of a
> different ilk.

As steganography shows, any value which relies on imprecise human "parsing" (e.g. looking at a photograph and figuring out what it is) can be used as an aggregate, though the semantics will be awkward and fall far short of relations.

> So, yes, I think you can talk about structured and unstructured data,
> and I don't think it is any harder for us to determine whether
> something is one or the other than it is to determine whether a
> structure is an abode or not (recognizing there could be grey areas in
> each, but in most instances it would be clear).

My original point (badly expressed, as I re-read) is that the structure depends on one's context. To a DBMS administrator, the structure is a series of file system directories, with each multi-megabyte file being an unstructured globa. To a DBMS user, the structure is a database (or relation within the database).

The whole point of semi-structured (and I do like Jan Hidders's definition, though it begs exploration) is that since we don't know the schema for a given piece of a structure, our only avenue is human intuition, not automated reasoning. Hence not data.

> It is function, which includes context (the igloo would not work as
> well in Texas as in Alaska), and not shape that determines whether a
> structure is an abode.

When we first adopted him, our cat Simon's ear was an abode for ear mites. It is no longer. Did it start as structured, and become unstructured?

While some of the specifics strain credulity, the structured vs. unstructured depends on context. That's much of argument around XML - it's obviously an inferior aggregate compared with relations, and hence less suitable to data management. From which context? From the context of business ("application domain") rules - aka predicates. Relations provide the most general and powerful "aggregation mechanism" (I hate the term, but don't care to find a better one right now) in this context, the context most appropriate to software development (since it's useful from requirements through implementation and beyond).

> > Structures are more or less useful in different contexts, and
> > ground our viewpoints.

More sloppy wording - instead of "ground our viewpoints," I should have said "ground our discussions by establishing the context or scope."

> I agree with the first part of the sentence. I don't have to be stuck
> on houses that would work in Iowa to be able to tell that an igloo is
> still a possible abode for someone else. In other words, I don't need
> to employ my bias for one structure or another when identifying it (for
> the most part).

Evaluating alternatives in a given context doesn't require bias - my terminology might have skewed this point.

> > And I think it's only in information exchange that the issue of parsing
> > (translating a blob of characters or bytes into a structure) is so
> > important.
>
> If that translates to "it is only when you are doing something with
> data that you care about functions that operate on that type of data"
> then sure.

I meant that it's only in the context of information exchange that "semi-structured" has any meaning at all (e.g. someone receives something and both don't know and don't care about the type or domain of part of it).

> That is why I can't (yet, there is still hope) accept lists
> only as values for attributes and not known to the model at a lower (or
> higher) level, for example. cheers! --dawn

You said "sure" above, meaning that you care about functions on a type of data - that's all in the definition of a type. Why would you care whether the DBMS "cares" about lists as long as you do, and can use them?

  • erk
Received on Fri Aug 18 2006 - 13:06:00 CEST

Original text of this message