Re: It don't mean a thing ...

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Thu, 10 Jun 2004 09:22:12 +0200
Message-ID: <40c80c24$0$48933$e4fe514c_at_news.xs4all.nl>


Chris Hoess wrote:

> mAsterdam wrote:
>>Eric Kaun wrote:
>>
>>>I take the stance that data on its own does have meaning, or at least that
>>>meaning gives it a useful definition. Without meaning (imbued by virtue of
>>>some reference, e.g. the business that wants to use it), it's just... bits?
>>>Facts? To me, the word "data" makes a useful distinction between phenomena
>>>in some raw, perceived-yet-unprocessed state, and that with which we need to
>>>work.
>>
>>This is much closer to what I thought was meant when people used the 
>>word data. But this wide-spread definition suggests we were both wrong, 
>>doesn't it? Language is as language does. I do not pretend I can 
>>redefine it on my own. I can, however, change my own choice of words.
>>I know what I like thinking about, and it is not data as it is defined 
>>there.
>>
>>But maybe (I hope) it is simply a mistake, copied all over the place. 
>>That is why I also asked (as yet unanswered) for a source of the definition.
>>
>>>In any event, applications use the meaning of the data. Nearly every app,
>>>regardless of where it gets its data, makes assumptions about what's stored,
>>>its format, columns, relation heading, whatever. Even very dynamic apps,
>>>with interpreters for domain languages, make some assumptions. Those
>>>assumptions are the meaning, or at least require that the meaning be
>>>"enforced". Those assumptions are critical to allowing more than one
>>>application to deal usefully with business data.
>>
>>Sharing. Sharing has costs and benefits.

>
> Perhaps some of the confusion here is coming from the fact that the word
> "data" is used indiscriminately to describe what are perhaps two separate
> concepts. If we are speaking of "data" as information about the real world
> (or, for that matter, some imagined one), such as "John owns 2 cars" or
> "The solution has a concentration of 5M", I agree that there is meaning in
> data. However, the word "data" is also used to describe the bits flowing
> around inside our databases, and I would propose that this is not strictly
> data; these are representations of data. These representations *when
> combined with the semantic interpretation* form the data.

I have seen the definitions that state this meaning of the word data as bits and bytes. I just haven't seen the word data actually used that way (except to define information as 'meaningful' data).

Do people really talk about 'irrelevant information'? Not in my experience. They could say 'this data is irrelevant, give me some information!' Information is (again in my experience) mostly used having the notion of 'relevant', sometimes 'new'.

Even when talking about hexdumps people are wondering what data is really there: what does it mean? (I admit they could also say: what information is there?)

Why do we talk about 'database, not 'informationbase'? I think it is because not all (meaningful) content of a database is 'informational' (i.e. relevant or new).

Im my personal experience the actual use of the words data and information is more refined than the often quoted definitons (information as data with meaning).

Strange thing: the supposedly strict definitons do not reflect actual use, instead of giving some precision thy just blur a useful distinction. They don't help. I would prefer definitions that honour the actually made distinctions: information as relevant data.

> A nice statement of this dichotomy occurs, interestingly enough, in the
> SGML standard. The "document type definition" (IIRC; I have to check the
> standard for some of the fine points of nomenclature) comprises two parts.
> One is the machine-readable "DTD" which defines the grammar of a class of
> documents insofar as SGML allows it to. However, the other part of the
> document type definition is the collection of semantic rules for
> interpretation of the document. Again, IIRC, under the SGML definition of
> validity, an SGML document which conforms to the grammar (that is, has
> been declared valid within the limits of SGML validation by machine) is
> not valid if it does not comply with the semantics of the document type
> definition.
>
> It's easy to refer to these representations of data as data because we
> usually think of them as such; generally, we look at some atom from a
> database and don't think of it as a bit representation, but attach its
> semantic interpretation (which we know, or think we know, based on
> familiarity with the database and perhaps various convenient assumptions).
> But it's possible for people to attach different meanings to the same
> representation, usually disastrously; "12" in column "LENGTH" becomes 12m
> or 12 ft., depending (and what axis does "LENGTH" apply to, anyway). So
> I'd say that while data does have meaning, that meaning doesn't pass the
> "barrier of semantic interpretation" around the database. (This could be
> an application layer, but it doesn't need to be; a README file explaining
> the meaning of each column and table could suffice).

I'll try to get this barrier sharp:
Under a closed world assumption any value of type LENGTH may sometimes be in abstract units without denoting the actual units - however, to interpret these values *outside* the closed world we *need* an associated unit.

The predicate (as used in the 3rd manifesto) serves as the README for a relational variable.

 >Thoughts? Am I making sense here?

I think so.
Are we talking about the same things here? I think so. (At least: I hope so :-) Received on Thu Jun 10 2004 - 09:22:12 CEST

Original text of this message