Re: c.d.theory lexicon overview

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Thu, 22 Apr 2004 11:03:49 -0500
Message-ID: <c68qde$40f$1_at_news.netins.net>


"Senny" <sennomo_at_hotmail.com> wrote in message news:7FKhc.1320$17.155177_at_news1.epix.net...
> Having observed this discussion for a while, and having been prodded by
one
> of its participants, I've decided to throw in my two cents. I have a
> rather different take on the situation from most programmers, so maybe I
> can help with a little lingustic analysis.
>
> WARNING: Much linguistic pedantry follows.
>
> PRESENT PROBLEM:
>
> Database theory lacks a precise terminology.
>
> ROOT PROBLEM:
>
> Computer science lacks a precise terminology. This is the
> real problem we have to tackle if we wish to fix the specific problem of
> database theory discussion.
>
> PAINFUL EFFECTS:
>
> A) Much confusion results from this lack of precision, which really annoys
> us all from time to time. Countless hours and thousands, possibly
> millions, of dollars per year are lost on dealing with this confusion
> worldwide. (I'm serious.) I'd rather spend time getting a project done
> than arguing over the meaning of 'object'.
>
> B) Lack of unified terminology results in different programming
environments
> having distinct meanings of the same term. This complicates the joining
of
> different environments.
>
> C) English may be the primary language of computing, but it is not the
only
> one. Other languages usually adopt our words (with
> phonological/graphological adjustments) or translate our words to create
> new terms. This problem is really messy, but since most people don't care
> too much, I won't beyond mentioning it.
>
> ASPECTS OF THE PROBLEM:
>
> A) Semantic overloading (Homonymity): Many word-forms we use have too many
> meanings. _function_ comes to mind.
>
> B) Semantic overlap (Synonymity and near-synonmity): To some people,
> 'domain', 'class' and 'type' mean the same thing; to others, they mean
three
> distinct things; to most, they are different yet have overlapping meaning.
>
> C) Semantic fuzziness: Some words sort-of have one meaning, but nobody is
> quite sure what that meaning is. 'object' is a prime example. (Compare
> 'object' to 'pornography'--in both cases, many would say, "I can't tell
you
> what it is, but I know it when I see it.")
>
> D) Morphological inconsistency: As James L. Ryan hinted at in his
> "Grammatical Inconsistencies" post, we say "join", "projection",
> "intersection" and "union" as nouns and "join", "project", "intersect",
and
> ?"union" as verbs. Loosely speaking, in English, we have a marked
tendency
> to use nouns as verbs and adjectives, and verbs as nouns, adjectives as
> nouns and verbs, etc.
>
> PROOF THAT THIS PROBLEM DOES NOT HAVE TO EXIST:
>
> Some scientific fields, namely biology and medicine, have very
well-defined
> terminologies. While there may be some confusion in their fields, it is
> very limited. Our lives as programmers would be easier if we could attain
> that level of clarity.
>
> GOAL:
>
> The goal is to come as close as possible to having a one-to-one mapping
> between word-forms and word-meanings (lexemes), with allowances for
> morphological variants like plurals. Since I don't have italics in plain
> text, I'll mark forms by surrounding a form with underscore characters.
> Lexemes are in single quotes. For example, the forms _find_ and _found_
> are forms of the lexeme 'find'; likewise, _tuple_ and _tuples_ are forms
of
> the lexeme 'tuple'. When you lack a one-to-one mapping, you have
> homonymity and/or synonymity. Homonymity is worse than synonymity. So,
we
> want no homonyms and as few synonyms as possible.
>
> HOW TO APPROACH THE GOAL:
>
> There is no absolutely right way to solve the problem. However, here are
my
> starting suggestions.
>
> A) Weed out our current vocabulary.
>
> 1) Kill-file for serious offenders: Some forms are so ambiguous, they
> have to be removed from the lexicon. Here are just a few: _object_,
> _function_, _attribute_, _entity_, _domain_. These forms represent so
many
> different things that even in limited contexts they can still be
confusing.
> Using these words in computer science is kind of like a biologist using
the
> word "creature".

I was with you up to this point -- the word "function" has a very precise definition, agreed upon by mathematicians for quite some time. The term "relational' on the other hand, seems to have had its meaning morphed a few times by computer scientists, while quite clear in mathematical terms what it is. So, add in "relations" and remove "functions" and then I'll understand your point better ;-)

I also think that when a particular vendor uses a term inappropriately, we don't have to rule out the term -- we can simply indicate that what we are dealing with is an "XYZ Relation" which might not look anything like what we define a relation to be as an industry.

> 2) Reconsideration of minor offenders: Some forms are somewhat confusing,
> but do not cause *too* much time-wasting debate, e.g., _relation_,
> _pointer_, _process_, _thread_, _operating system_, _network_, _array_,
> _byte_, _window_, _drive_, _binary_, _null_. In most cases, context
> suffices to remove ambiguity. We can keep such forms, but it would be
nice
> if we could find better ones. Probably at least some of the meanings of
> each of these forms should be provided new forms.

I can see that building these lists of big and minor offenders could lead to too many not-very-productive discussions. So, what if we don't work to distinguish between what is a major and what is a minor problem, I think we will get futher faster.

I figure that such terms as "relation", while abused by many vendors and while some database theorists have chosen to remove it from its root of being a mathematical relation, can simply have multiple definitions, with those using the not-as-favored definition for cdt can put a qualifier in front of the term. For example, if we opt for "relation" to be what C.J.Date defines to be a relation, rather than using the definition from mathematics for a relation, then when I talk about a "relation" I can use the term "mathematical relation". So, how about if we have a primary definition for any term and then have adjectives for any secondary versions that will be useful to us?

> 3) Identification of "good" forms: Some current forms are unambiguos
> enough to keep around, e.g., _tuple_, _socket_, _integer_, _bit_, _octet_,
> _signed_, _modem_, _processor_.
>
> B) Assign forms to replace the ones we got rid of.

Getting rid of common words seems like a poor idea -- just define a primary version of it for cdt and use adjectives, as I indicated above.

> 1) Take unused words from English and apply them in a specific sense.
Most
> computing terms in fact come from everyday English. This approach seems
> convenient at first, but usually causes more confusion in the long
> run--after all, several times over, somebody thought that assigning the
form
> _function_ to yet another lexeme was a good idea.
>
> 2) Derive new forms from accepted forms. Numerous current terms were
> created this way: _unsigned integer_, _bit_ (from _binary integer_),
_byte_
> (from _bit_), _nybble_ (from _byte_), _varchar_ (from _variable
> character_), _modem_ (from _modulator/demodulator_), _download_ (from
> _down_ and _load_), etc. Many acronyms which later became accepted as
> terms in their own right come from this approach, e.g., _FTP_, _DNS_,
> _MIME_, _SQL_, _RAM_, _grep_, _RFC_, etc. The success of this approach
> depends on the clarity of the base forms used to create the derived forms.

agreed.

> 3) Adopt words from other languages. English is full of adopted words,
> mostly from Old French (thanks to the Normans). More recently, science
and
> philosophy have introduced lots of Latin and Greek words. Here are just a
> few such words we use in computing: 'cache' (from French), 'integer' (from
> Latin), 'predicate' (from Latin), 'algorithm' (from Arabic), 'algebra'
> (from Arabic), 'calculus' (from Latin). We tend to adopt words when we
> can't find one we already have that quite fits.

OK.

<snip>
 CONCLUDING REMARKS:
>
> I hope I have adequately explained the scope of the problem. It's bigger
> than most people realize. I don't know if it will be conquered, but I
know
> it can be. After all, physicians didn't always have the precise
> terminology they have today. There are many other details I could
discuss,
> but they're not necessary for an overview of the problem.

Thanks for piping up on this, Senny -- very helpful. I'd like to start without kicking out any terms, however. I really doubt if many people would disagree with the definition of function that I provided, for example.

Cheers! --dawn Received on Thu Apr 22 2004 - 18:03:49 CEST

Original text of this message