c.d.theory lexicon overview
Date: Thu, 22 Apr 2004 07:39:47 GMT
Message-ID: <7FKhc.1320$17.155177_at_news1.epix.net>
Having observed this discussion for a while, and having been prodded by one of its participants, I've decided to throw in my two cents. I have a rather different take on the situation from most programmers, so maybe I can help with a little lingustic analysis.
WARNING: Much linguistic pedantry follows.
PRESENT PROBLEM: Database theory lacks a precise terminology.
ROOT PROBLEM: Computer science lacks a precise terminology. This is the real problem we have to tackle if we wish to fix the specific problem of database theory discussion.
PAINFUL EFFECTS:
- Much confusion results from this lack of precision, which really annoys us all from time to time. Countless hours and thousands, possibly millions, of dollars per year are lost on dealing with this confusion worldwide. (I'm serious.) I'd rather spend time getting a project done than arguing over the meaning of 'object'.
- Lack of unified terminology results in different programming environments
having distinct meanings of the same term. This complicates the joining of
different environments.
- English may be the primary language of computing, but it is not the only one. Other languages usually adopt our words (with phonological/graphological adjustments) or translate our words to create new terms. This problem is really messy, but since most people don't care too much, I won't beyond mentioning it.
ASPECTS OF THE PROBLEM:
- Semantic overloading (Homonymity): Many word-forms we use have too many meanings. _function_ comes to mind.
- Semantic overlap (Synonymity and near-synonmity): To some people,
'domain', 'class' and 'type' mean the same thing; to others, they mean three
distinct things; to most, they are different yet have overlapping meaning.
- Semantic fuzziness: Some words sort-of have one meaning, but nobody is quite sure what that meaning is. 'object' is a prime example. (Compare 'object' to 'pornography'--in both cases, many would say, "I can't tell you
what it is, but I know it when I see it.")- Morphological inconsistency: As James L. Ryan hinted at in his "Grammatical Inconsistencies" post, we say "join", "projection", "intersection" and "union" as nouns and "join", "project", "intersect", and ?"union" as verbs. Loosely speaking, in English, we have a marked tendency to use nouns as verbs and adjectives, and verbs as nouns, adjectives as nouns and verbs, etc.
PROOF THAT THIS PROBLEM DOES NOT HAVE TO EXIST:
Some scientific fields, namely biology and medicine, have very well-defined
terminologies. While there may be some confusion in their fields, it is
very limited. Our lives as programmers would be easier if we could attain
that level of clarity.
GOAL:
The goal is to come as close as possible to having a one-to-one mapping
between word-forms and word-meanings (lexemes), with allowances for
morphological variants like plurals. Since I don't have italics in plain
text, I'll mark forms by surrounding a form with underscore characters.
Lexemes are in single quotes. For example, the forms _find_ and _found_
HOW TO APPROACH THE GOAL:
There is no absolutely right way to solve the problem. However, here are my
starting suggestions.
_function_, _attribute_, _entity_, _domain_. These forms represent so many
different things that even in limited contexts they can still be confusing.
Using these words in computer science is kind of like a biologist using the
word "creature".
_pointer_, _process_, _thread_, _operating system_, _network_, _array_,
suffices to remove ambiguity. We can keep such forms, but it would be nice
if we could find better ones. Probably at least some of the meanings of
each of these forms should be provided new forms.
_signed_, _modem_, _processor_.
_function_ to yet another lexeme was a good idea.
_down_ and _load_), etc. Many acronyms which later became accepted as
terms in their own right come from this approach, e.g., _FTP_, _DNS_,
_MIME_, _SQL_, _RAM_, _grep_, _RFC_, etc. The success of this approach
depends on the clarity of the base forms used to create the derived forms.
BARRIERS TO SOLVING THE PROBLEM:
- Tradition: We inherit words from people already using them, whether
they're good or not. Computer science (and especially database theory)
inherited much from mathematics, a field with a fuzzy, context-dependent
terminology (mostly evident in its notation). There has never been a major
concerted effort to clarify our vocabulary. We have thousands of terms in
today's computing lexicon, and at least hundreds (including some of the
most common ones!) are problematic. How can we fight this?
- Ad-hockery: When people come up with a new idea, they often hastily assign some label to it without much consideration for the future. Worse yet, they often usurp an existing form. This is especially the case in our field. For example, a Java 'attribute' is *nothing like* a C# 'attribute' (but rather a C# 'field'). (Aside: If I see one more new use of the forms
_attribute_ or _function_, I might puke.) Even if we successfully combat
tradition, we have to beware of ad-hockery, or the problem will reappear. - Ad-hockery: When people come up with a new idea, they often hastily assign some label to it without much consideration for the future. Worse yet, they often usurp an existing form. This is especially the case in our field. For example, a Java 'attribute' is *nothing like* a C# 'attribute' (but rather a C# 'field'). (Aside: If I see one more new use of the forms
- Comceptual confusion: Computing is so new and so constantly changing that in many senses, we don't know what we're doing. It's hard to put a label on a concept that we can't put our finger on, so to speak. Context is a major issue. We have layers of abstraction upon abstraction. The layering effect gives rise to differentiation between terms like 'primitive type' and 'abstract data type' or 'type' and 'class'. A variable's 'value' can be another 'variable'. Are 'type' and 'domain' the same thing? If so, how? I could go on forever on this point, so I'll stop here.
CONCLUDING REMARKS: I hope I have adequately explained the scope of the problem. It's bigger than most people realize. I don't know if it will be conquered, but I know it can be. After all, physicians didn't always have the precise terminology they have today. There are many other details I could discuss, but they're not necessary for an overview of the problem.
For those of you who find the task of restructuring English computing terminology too daunting but still crave clarity, you can learn Lojban (http://www.lojban.org/). I find Lojban a bit too computerish for my human language needs, and nobody I know speaks it, so I'll stick with attempting to improve English.
--Senny Received on Thu Apr 22 2004 - 09:39:47 CEST