Re: It don't mean a thing ...

From: Chris Hoess <choess_at_stwing.upenn.edu>
Date: Thu, 10 Jun 2004 01:08:06 +0000 (UTC)
Message-ID: <slrnccfd3m.f1q.choess_at_force.stwing.upenn.edu>


In article <40bcfe7e$0$37789$e4fe514c_at_news.xs4all.nl>, mAsterdam wrote:
> Eric Kaun wrote:

>> I take the stance that data on its own does have meaning, or at least that
>> meaning gives it a useful definition. Without meaning (imbued by virtue of
>> some reference, e.g. the business that wants to use it), it's just... bits?
>> Facts? To me, the word "data" makes a useful distinction between phenomena
>> in some raw, perceived-yet-unprocessed state, and that with which we need to
>> work.

>
> This is much closer to what I thought was meant when people used the
> word data. But this wide-spread definition suggests we were both wrong,
> doesn't it? Language is as language does. I do not pretend I can
> redefine it on my own. I can, however, change my own choice of words.
> I know what I like thinking about, and it is not data as it is defined
> there.
>
> But maybe (I hope) it is simply a mistake, copied all over the place.
> That is why I also asked (as yet unanswered) for a source of the definition.
>
>> In any event, applications use the meaning of the data. Nearly every app,
>> regardless of where it gets its data, makes assumptions about what's stored,
>> its format, columns, relation heading, whatever. Even very dynamic apps,
>> with interpreters for domain languages, make some assumptions. Those
>> assumptions are the meaning, or at least require that the meaning be
>> "enforced". Those assumptions are critical to allowing more than one
>> application to deal usefully with business data.

>
> Sharing. Sharing has costs and benefits.
>

Perhaps some of the confusion here is coming from the fact that the word "data" is used indiscriminately to describe what are perhaps two separate concepts. If we are speaking of "data" as information about the real world (or, for that matter, some imagined one), such as "John owns 2 cars" or "The solution has a concentration of 5M", I agree that there is meaning in data. However, the word "data" is also used to describe the bits flowing around inside our databases, and I would propose that this is not strictly data; these are representations of data. These representations *when combined with the semantic interpretation* form the data.

A nice statement of this dichotomy occurs, interestingly enough, in the SGML standard. The "document type definition" (IIRC; I have to check the standard for some of the fine points of nomenclature) comprises two parts. One is the machine-readable "DTD" which defines the grammar of a class of documents insofar as SGML allows it to. However, the other part of the document type definition is the collection of semantic rules for interpretation of the document. Again, IIRC, under the SGML definition of validity, an SGML document which conforms to the grammar (that is, has been declared valid within the limits of SGML validation by machine) is not valid if it does not comply with the semantics of the document type definition.

It's easy to refer to these representations of data as data because we usually think of them as such; generally, we look at some atom from a database and don't think of it as a bit representation, but attach its semantic interpretation (which we know, or think we know, based on familiarity with the database and perhaps various convenient assumptions). But it's possible for people to attach different meanings to the same representation, usually disastrously; "12" in column "LENGTH" becomes 12m or 12 ft., depending (and what axis does "LENGTH" apply to, anyway). So I'd say that while data does have meaning, that meaning doesn't pass the "barrier of semantic interpretation" around the database. (This could be an application layer, but it doesn't need to be; a README file explaining the meaning of each column and table could suffice).

Thoughts? Am I making sense here?

-- 
Chris Hoess
Received on Thu Jun 10 2004 - 03:08:06 CEST

Original text of this message