Re: It don't mean a thing ...

From: Chris Hoess <choess_at_stwing.upenn.edu>
Date: Fri, 18 Jun 2004 02:32:37 +0000 (UTC)
Message-ID: <slrncd4l22.e2r.choess_at_force.stwing.upenn.edu>


In article <40c80c24$0$48933$e4fe514c_at_news.xs4all.nl>, mAsterdam wrote:

> Chris Hoess wrote:

>> A nice statement of this dichotomy occurs, interestingly enough, in the
>> SGML standard. The "document type definition" (IIRC; I have to check the
>> standard for some of the fine points of nomenclature) comprises two parts.
>> One is the machine-readable "DTD" which defines the grammar of a class of
>> documents insofar as SGML allows it to. However, the other part of the
>> document type definition is the collection of semantic rules for
>> interpretation of the document. Again, IIRC, under the SGML definition of
>> validity, an SGML document which conforms to the grammar (that is, has
>> been declared valid within the limits of SGML validation by machine) is
>> not valid if it does not comply with the semantics of the document type
>> definition.
>>
>> It's easy to refer to these representations of data as data because we
>> usually think of them as such; generally, we look at some atom from a
>> database and don't think of it as a bit representation, but attach its
>> semantic interpretation (which we know, or think we know, based on
>> familiarity with the database and perhaps various convenient assumptions).
>> But it's possible for people to attach different meanings to the same
>> representation, usually disastrously; "12" in column "LENGTH" becomes 12m
>> or 12 ft., depending (and what axis does "LENGTH" apply to, anyway). So
>> I'd say that while data does have meaning, that meaning doesn't pass the
>> "barrier of semantic interpretation" around the database. (This could be
>> an application layer, but it doesn't need to be; a README file explaining
>> the meaning of each column and table could suffice).
> 
> I'll try to get this barrier sharp:
> Under a closed world assumption any value of type LENGTH
> may sometimes be in abstract units without denoting
> the actual units - however, to interpret these values
> *outside* the closed world we *need* an associated unit.
> 
> The predicate (as used in the 3rd manifesto) serves
> as the README for a relational variable.
> 
> >Thoughts? Am I making sense here?
> 
> I think so.
> Are we talking about the same things here?
> I think so. (At least: I hope so :-)

Yes, I also think so. This also ties in to the discussion here about "human-readable" language. What is the advantage of this, after all? Verbose languages are a disadvantage to the programmer-more typing to express himself. The reason we are interested is because of this "invisible" semantic layer. Because these are concepts, ideas in people's heads that can't be directly copied and passed around, the programmer may not have gotten them quite right--so we want a language where the "digital" portion of the constraints (values and which columns they apply to, etc.) can easily be coupled to the "semantics" and checked for real-world correctness not by the programmer, but by someone with "domain knowledge". This is also why pushing constraints into the application is dangerous--each application programmer has to decompose the real-world constraints into the "semantic" and "digital" portions, and it's easy to make a mistake with the semantics. Better to do it once and do it right.

(For that matter, it's a bit of a strawman to make the distinction between constraints "stored" in the application and in the database. If the DBMS has a good system catalog, the application should be able to query the DBMS for the relevant constraints and apply them as it initializes itself, so centralized storage of constraints doesn't mean that they can only be checked at the database level.)

Looking back, most of this post seems to be stating the obvious, but then again, nothing is really obvious in c.d.t...

-- 
Chris Hoess
Received on Fri Jun 18 2004 - 04:32:37 CEST

Original text of this message