Re: c.d.theory lexicon overview

From: Alan <alan_at_erols.com>
Date: Thu, 22 Apr 2004 12:46:13 -0400
Message-ID: <c68ssm$87c51$1_at_ID-114862.news.uni-berlin.de>


Is that an abstract of your PhD. thesis?

"Senny" <sennomo_at_hotmail.com> wrote in message news:7FKhc.1320$17.155177_at_news1.epix.net...
> Having observed this discussion for a while, and having been prodded by
one
> of its participants, I've decided to throw in my two cents. I have a
> rather different take on the situation from most programmers, so maybe I
> can help with a little lingustic analysis.
>
> WARNING: Much linguistic pedantry follows.
>
> PRESENT PROBLEM:
>
> Database theory lacks a precise terminology.
>
> ROOT PROBLEM:
>
> Computer science lacks a precise terminology. This is the
> real problem we have to tackle if we wish to fix the specific problem of
> database theory discussion.
>
> PAINFUL EFFECTS:
>
> A) Much confusion results from this lack of precision, which really annoys
> us all from time to time. Countless hours and thousands, possibly
> millions, of dollars per year are lost on dealing with this confusion
> worldwide. (I'm serious.) I'd rather spend time getting a project done
> than arguing over the meaning of 'object'.
>
> B) Lack of unified terminology results in different programming
environments
> having distinct meanings of the same term. This complicates the joining
of
> different environments.
>
> C) English may be the primary language of computing, but it is not the
only
> one. Other languages usually adopt our words (with
> phonological/graphological adjustments) or translate our words to create
> new terms. This problem is really messy, but since most people don't care
> too much, I won't beyond mentioning it.
>
> ASPECTS OF THE PROBLEM:
>
> A) Semantic overloading (Homonymity): Many word-forms we use have too many
> meanings. _function_ comes to mind.
>
> B) Semantic overlap (Synonymity and near-synonmity): To some people,
> 'domain', 'class' and 'type' mean the same thing; to others, they mean
three
> distinct things; to most, they are different yet have overlapping meaning.
>
> C) Semantic fuzziness: Some words sort-of have one meaning, but nobody is
> quite sure what that meaning is. 'object' is a prime example. (Compare
> 'object' to 'pornography'--in both cases, many would say, "I can't tell
you
> what it is, but I know it when I see it.")
>
> D) Morphological inconsistency: As James L. Ryan hinted at in his
> "Grammatical Inconsistencies" post, we say "join", "projection",
> "intersection" and "union" as nouns and "join", "project", "intersect",
and
> ?"union" as verbs. Loosely speaking, in English, we have a marked
tendency
> to use nouns as verbs and adjectives, and verbs as nouns, adjectives as
> nouns and verbs, etc.
>
> PROOF THAT THIS PROBLEM DOES NOT HAVE TO EXIST:
>
> Some scientific fields, namely biology and medicine, have very
well-defined
> terminologies. While there may be some confusion in their fields, it is
> very limited. Our lives as programmers would be easier if we could attain
> that level of clarity.
>
> GOAL:
>
> The goal is to come as close as possible to having a one-to-one mapping
> between word-forms and word-meanings (lexemes), with allowances for
> morphological variants like plurals. Since I don't have italics in plain
> text, I'll mark forms by surrounding a form with underscore characters.
> Lexemes are in single quotes. For example, the forms _find_ and _found_
> are forms of the lexeme 'find'; likewise, _tuple_ and _tuples_ are forms
of
> the lexeme 'tuple'. When you lack a one-to-one mapping, you have
> homonymity and/or synonymity. Homonymity is worse than synonymity. So,
we
> want no homonyms and as few synonyms as possible.
>
> HOW TO APPROACH THE GOAL:
>
> There is no absolutely right way to solve the problem. However, here are
my
> starting suggestions.
>
> A) Weed out our current vocabulary.
>
> 1) Kill-file for serious offenders: Some forms are so ambiguous, they
> have to be removed from the lexicon. Here are just a few: _object_,
> _function_, _attribute_, _entity_, _domain_. These forms represent so
many
> different things that even in limited contexts they can still be
confusing.
> Using these words in computer science is kind of like a biologist using
the
> word "creature".
>
> 2) Reconsideration of minor offenders: Some forms are somewhat confusing,
> but do not cause *too* much time-wasting debate, e.g., _relation_,
> _pointer_, _process_, _thread_, _operating system_, _network_, _array_,
> _byte_, _window_, _drive_, _binary_, _null_. In most cases, context
> suffices to remove ambiguity. We can keep such forms, but it would be
nice
> if we could find better ones. Probably at least some of the meanings of
> each of these forms should be provided new forms.
>
> 3) Identification of "good" forms: Some current forms are unambiguos
> enough to keep around, e.g., _tuple_, _socket_, _integer_, _bit_, _octet_,
> _signed_, _modem_, _processor_.
>
> B) Assign forms to replace the ones we got rid of.
>
> 1) Take unused words from English and apply them in a specific sense.
Most
> computing terms in fact come from everyday English. This approach seems
> convenient at first, but usually causes more confusion in the long
> run--after all, several times over, somebody thought that assigning the
form
> _function_ to yet another lexeme was a good idea.
>
> 2) Derive new forms from accepted forms. Numerous current terms were
> created this way: _unsigned integer_, _bit_ (from _binary integer_),
_byte_
> (from _bit_), _nybble_ (from _byte_), _varchar_ (from _variable
> character_), _modem_ (from _modulator/demodulator_), _download_ (from
> _down_ and _load_), etc. Many acronyms which later became accepted as
> terms in their own right come from this approach, e.g., _FTP_, _DNS_,
> _MIME_, _SQL_, _RAM_, _grep_, _RFC_, etc. The success of this approach
> depends on the clarity of the base forms used to create the derived forms.
>
> 3) Adopt words from other languages. English is full of adopted words,
> mostly from Old French (thanks to the Normans). More recently, science
and
> philosophy have introduced lots of Latin and Greek words. Here are just a
> few such words we use in computing: 'cache' (from French), 'integer' (from
> Latin), 'predicate' (from Latin), 'algorithm' (from Arabic), 'algebra'
> (from Arabic), 'calculus' (from Latin). We tend to adopt words when we
> can't find one we already have that quite fits.
>
> 4) Make up a new form basically out of nowhere. Such forms usually come
> from some form of psychological association with an existing idea. Here
> are a few: 'dongle', '404' (from the HTTP protocol), 'baud' (based on the
> name Baudot), 'bogon' (based on 'bogus'), 'frob', 'kluge' (perhaps from
> German or Polish, but not in its current sense), 'munge', 'boolean' (based
> on the name 'Boole'), gaussian (based on the name 'Gauss'), 'spam',
> 'swizzle'. Such words are rare, because even though everybody invents new
> words from time to time, they rarely catch on. After all, if I decide
that
> the procedure kind of _function_ should now be called _meklor_, who would
> go along with it?
>
> BARRIERS TO SOLVING THE PROBLEM:
>
> A) Tradition: We inherit words from people already using them, whether
> they're good or not. Computer science (and especially database theory)
> inherited much from mathematics, a field with a fuzzy, context-dependent
> terminology (mostly evident in its notation). There has never been a
major
> concerted effort to clarify our vocabulary. We have thousands of terms in
> today's computing lexicon, and at least hundreds (including some of the
> most common ones!) are problematic. How can we fight this?
>
> B) Ad-hockery: When people come up with a new idea, they often hastily
> assign some label to it without much consideration for the future. Worse
> yet, they often usurp an existing form. This is especially the case in
our
> field. For example, a Java 'attribute' is *nothing like* a C# 'attribute'
> (but rather a C# 'field'). (Aside: If I see one more new use of the forms
> _attribute_ or _function_, I might puke.) Even if we successfully combat
> tradition, we have to beware of ad-hockery, or the problem will reappear.
>
> C) Comceptual confusion: Computing is so new and so constantly changing
that
> in many senses, we don't know what we're doing. It's hard to put a label
> on a concept that we can't put our finger on, so to speak. Context is a
> major issue. We have layers of abstraction upon abstraction. The
layering
> effect gives rise to differentiation between terms like 'primitive type'
and
> 'abstract data type' or 'type' and 'class'. A variable's 'value' can be
> another 'variable'. Are 'type' and 'domain' the same thing? If so, how?
I
> could go on forever on this point, so I'll stop here.
>
> CONCLUDING REMARKS:
>
> I hope I have adequately explained the scope of the problem. It's bigger
> than most people realize. I don't know if it will be conquered, but I
know
> it can be. After all, physicians didn't always have the precise
> terminology they have today. There are many other details I could
discuss,
> but they're not necessary for an overview of the problem.
>
> For those of you who find the task of restructuring English computing
> terminology too daunting but still crave clarity, you can learn Lojban
> (http://www.lojban.org/). I find Lojban a bit too computerish for my
human
> language needs, and nobody I know speaks it, so I'll stick with attempting
> to improve English.
>
> --Senny
>
Received on Thu Apr 22 2004 - 18:46:13 CEST

Original text of this message