(repost) cdt glossary 0.1.1
Date: Wed, 26 Jul 2006 01:32:13 +0200
Glossary 0.1.1: "You keep using that word. I do not think it means what February 2006 you think it means" --------------- -- Inigo Montoya
This glossary seeks to limit lengthy misunderstandings in comp.database.theory. This newsgroup uses terms from database modeling, design, implementation, operations, change management, cost sharing, productivity research, and /or basic database research.
People tend to assume that words mean what they are accustomed to, and take for granted that the other posters have about the same connotations. They don't always.
It consists of signposts: watch out! You may think the OP means A but she might mean B. Alternative names and views of the same concept are introduced when the danger of mutual misunderstandings is apparent. When context matters, it is provided. The glossary is a highly biased list of problematic concepts.
Some words are particularly suspect:
data (!), database, object, normalisation. Some just cause minor annoyances, the misunderstanding is cleared and the discussion goes on:
domain, type, transaction.
We don't know well-accepted, formal or comprehensive
definitions for everything. If you do have a useful
reference, please provide it.
If an informal description is all we have, so be it.
What the glossary is not:
The glossary is not a dictionary or encyclopedia, such as FOLDOC, Wikipedia (http://www.wikipedia.org), and the Web Dictionary of Cybernetics and Systems.
Specific links to serve the glossary's purpose are welcome, of course. Also, it does not try to be a FAQ for "all things database".
The glossary is built from contributions. Contributions from within this group are not credited, quotes from elsewhere are. If you want your name stated please say so.
If you want to contribute, see the notes at the end.
A value, used to identify a location.
What is to be found there is up to the rest of the system.
An address is a value used to locate ... A reference is a value used to refer ...
The difference between *locate* and *refer* is crucial here.
Many organizations have a CM process in place in order to make their evolution more manageable. The organization of data within a database can and will change with these changing circumstances. A DBMS should provide facilities to support this. Changing the underlying structure should be possible without affecting what is already stored. For example, you can add a column to a table without losing what is already there.
Related adjectives: maintainable, agile, flexible, adaptive.
A class is what provides a name and a place for the abstract behavior of a set of objects said to belong to the class. (Larry Wall, Apocalypse 12)
Other definitions welcome, this goes for the rest as well, of course.
Some use 'class' as having exposed data. Please be explicit about this if you do so.
"Known facts that can be recorded and have implicit meaning." -- Fundamentals of Database Systems, Elmasri & Navathe.
When people discuss data in the context of database, they are usually talking of something with meaning. There are people who think that data doesn't need to mean anything.
http://en.wikipedia.org/wiki/Information#Information_is_not_data (currently) says "information has meaning (i.e.: can inform), while data does not. ".
Somehow this "data has no meaning" idea has caught on.
1b. a record on a medium of some fact in the real world. 2. encoded information
3. a combination of sign and meaning
Warning: tongue in cheek definition
Information is what you want. Data is what you are given.
"A logically coherent collection of related real-world data assembled for a specific purpose." -- rephrased from "Fundamentals of Database Systems", Elmasri & Navathe.
- Deluxe file system
- Shared databank (E. Codd)
"an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior."
(C. J. Date, An Introduction to Database Systems, 8e, 2003, p 15-16)
Data models are artificial constructs and may not completely represent the true nature of information and categorization. These categories already exist, to some degree, in the way information is handled outside the database.
Databases don't exist in vacuo; they're fed (and consulted) by users who would have some system of mental categorization even if they were shuffling everything around with paper and pencil.
1) A synonym for degree.
A relation R is of degree n if each tuple in R is an n-tuple.
2) An n-dimensional data structure, S, is one where each element of S can be uniquely addressed as S[i1][i2]...[in]
Note: Because a table in a SQL-DBMS can be seen as a conventional visualization of a mathematical relation where the dimension is as in 1) above, and can also be manipulated using a general purpose programming language with the dimension using 2) above being equal to 2, there can be confusion when using this term.
In this forum, use definition 1) freely and try to either avoid 2) or be very clear, such as "2D array," when employing definition 2).
1. Given a relation R, a domain is a set Sn such that for each tuple (A1, A2, ...An, ...Am) in R, An is an element of Sn.
2. A domain is a set of values: for example
"integers between 0 and 255", "character strings less than 10 characters long", "dates".
Sometimes used synonymously with type.
Thing of interest. (ISO)
"An entity is a 'thing' which can be distinctly identified. A specific person, company, or event is an example of an entity. " ("The Entity-Relationship Model-Toward a Unified View of Data", 1976, P. Chen., http://www2.cis.gsu.edu/dmcdonald/cis8140/Chen.pdf )
Edward Yourdon, who describes E/R in his work Modern Structured Analysis, (Prentice Hall 1989) defines the concept of Entity as having three properties:
- Each representation of an entity can uniquely be identified
- Each representation of an entity is playing an important role in the system it lives in. (it has to have a reason to be there)
- Each representation of an entity can be described by one or more attributes (data-elements, like name, age, quantity)
This term is often used when doing conceptual data modeling. When it is used with a particular product, technique, or technology, such as XML, refer to the use of the term within that "namespace" using an adjective, such as "XML entity" to distinguish it from the more generic use of the term.
For subtleties (e.g. strong and weak entity) - please search the web.
1. A piece of information about circumstances that exist or events that have occurred
2. A concept whose truth can be proved. 3. A statement or assertion of verified information. 4. An event known to have happened or something known to have existed.
1) An object which by any definition could be considered as 2 dimensional might informally be called flat.
The absence of hierarchy (multiple levels of details).
Note: Any use of the term flat tends to be seen as inflammatory by someone, so take care to use it only when intending to inflame ;-)
For now we have to live with different meanings of _function_ when talking about databases: "The function of this function is to get the tuples from B that are functionally dependent on A."
Three different contexts, but just about the same meaning:
A purpose or use.
A binary mathematical relation with at most one b for each a in (a,b). Software A subroutine, procedure, or method. notes: every operator is a function every function is a relation
Please be specific.
0. data in context, data with meaning.
(This implies a definition of data as being without context, without meaning - see data)
1. new data to the receptor.
2. available data, relevant to some decision or action.
[Information principle] (RM)
Chris Date in "EDGAR F. CODD 08/23/1923 – 04/18/2003 A TRIBUTE":
The entire information content of a relational database is represented in one and only one way: namely, as attribute values within tuples within relations.
A value, used to identify something.
See also primary key, and (TO DO:) foreign key.
(meaning vs use)
Say we currently have a validated statement about the exchange rate of some stock at some recent time.
- It does not matter to the meaning where/how this statement is represented. We have it.
- To the use of it it is important where/how it is represented, and available to relevant actors.
- Twenty years later the meaning of this statement is still the same.
- Twenty years later most of its usefulness will probably have gone.
It may be --- in some instances -- not appropriate to make this distinction. The meaning of data is always contextual. The same bit of data means different things to different structured viewpoints within the organization, for example, and at different times (epochs). One grain of sand does not form a beach. One bit of data itself has little meaning. It is rather the collective of all data that possesses greater notion of meaning.
1. One name for the industry surrounding the Nelson-Pick data model. In this context:
FILE: a real-world collective noun.
RECORD: a real-world object.
FIELD: is a real-world adjective.n.
2. A data field (or attribute) defined to permit a variable number of values as a list (array).
Roughly: a special marker that can be put in a place inside a data structure where an actual value is expected. Precisely what that marker means varies and there are at least three possibilities that are sometimes assumed:
(1) "Unknown value" This means that on the place of the marker there should actually be a value but this value is not known at the present time. For example, if a 'name' field in a tuple describing a person is 'null' then this person will have a name but we don't know it.
(2) "Absent value" This means that the property that is described by the value in question is simply not defined. For example, if the 'shipping-date' field in a tuple describing an order is 'null' then the order was not shipped yet.
(3) "Whatever SQL says it means" The exact meaning is hard to summarize briefly, but is a mixture of the previous two interpretations and involves a value with three truth-values ('true', 'false' and 'unknown').
- Confusion arises when people use terms like "null value", a paradox to some, a contradictio in terminis to others.
- Confusion arises due to the fact that nullness (the absence of value) is often represented on computers by the number 0. (Obviously, 0 is not null.)
- In some contexts, 'null' and 'nil' mean the same thing; in others, they do not.
In databases traditionally NULL is used and and opposed. If you want to go into this, please first search for mu NIL void NULL undef, 2VL 3VL.
"It isn't the things we don't know that give us trouble. It's the things we know that ain't so." - Will Rogers
Note: Several better proposals have been made for this entry. Unfortunately they all led to huge threads where the maintainer couldn't decide which texts to quote here.
1. Model of an entity, characterised by behaviour and state. (ISO) 2. Something intelligible or perceptible by the mind.
Table: A collection of columns (the table header) and rows (the body). Row: A collection of values, conforming to the table header columns.
One table may contain data about one entity,
about several entities, about one or several
relationships or any combination.
A column can be seen as the attribute of the entity/one of the entities/relationships about which the table is concerned.
[Primary key] (SQL, not RM)
A key of a table, composed of one or more named columns, uniquely identifies a row in a table. A table can have only one primary key.
" TYPES are sets of things we can talk about;
RELATIONS are (true) statements bout those things." -- Chris Date, feb 2004
- Set of possible values (i.e. IT equivalent of math 'domain').
- Set of possible values plus all possible operators defined on them. (i.e. synonymous to Class if 'class' is meant to include a possible set of values).
This is highly misunderstanding-prone area, so please take some care to be specific.
[Type - 3rdM]
In The Third Manifesto a type is:
- a pattern (possible representation)
- a domain for some operators (THE_xxx operators)
- a codomain for some operators (the "constructors")
There is a requirement for the 'domain' and the 'codomain' to be the same set.
A reference is a value, used to refer to something. A program can get the current value of that something (without ever knowing where it resides) by dereferencing, even if that something has been relocated between the time of first reference and the dereferencing.
[References, pointers, keys]
While references may be implemented as pointers, the programmer prefers not to know (if he prefers to know he should have used pointers).
In some programming languages one can declare
variables of a pointer type - these variables
can have pointer values.
m.m. (mutatis mutandis) reference.
Two operations are supported:
referencing and dereferencing.
On references only these operations are possible. On pointers other operations are possible.
The dereferencing operation takes a pointer
*value* and returns a pointer *variable* of
the type the pointer refers to.
The referencing operation is the inverse operation. It takes a *variable* and returns a pointer *value*. m.m. reference.
In Java the term pointer was avoided
because pointer is often used to mean
physical memory addresses.
Relational keys are not pointers.
1. A relation is a subset of the set of ordered tuples (A1, A2, ... Am) formed by the Cartesian cross-product of sets S1 x ... x Sm where each An is an element of Sn.
Note: A set, Sx, is not restricted from participating as a member of a relation more than once. Distinction between identical sets in math is possible through ordinal numbering such that given sets Sx and Sy, x <> y AND Sx is a subset of Sy and Sy is a subset of Sx; in relational theory, in contrast, it is by attribute name.
A set of database operations constituting a logical unit of work. Most DBMS include the ability to rollback complete transactions when an error is detected.
RELATIONs vs. RELATIONSHIPs
Can namespaces help to make some distance? In this case: RM.RELATION vs. ER.RELATIONSHIP
represented vs. described
RELATION(SHIP)s vs RELATION(SHIP)s SET
fact vs. thing (ENTITY).
First Order Logic vs. Higher Order Logic.
What, if there is, is the equivalent of an ENTITY(SET) in the RM ?
Does it make sense to talk about attributes of a fact ? How are those different from ATTRIBUTES of an ENTITY ?
Traditionally there can be Multivalued ATTRIBUTES in ER, RM has atomic ATTRIBUTES. So: RM.ATTRIBUTE and ER.ATTRIBUTE ?
In ER modeling, a RELATIONSHIP is defined over ENTITIES: "A relationship is an association between several entities." In RM, a RELATION is defined over VALUEs. What is the difference between ENTITIES and VALUEs ?
(please feel invited to write entries for these)
Dynamic vs static
Feel free to post suggestions to add or remove.
How to contribute
Please keep in mind that the focus of the glossary is on /real/ c.d.t. misunderstandings.
Some discussions, after many sidetracks, are reducible to /just/ different meanings and connotations of a word. The differences could be resolved with just: "Ah, now I see what you meant by that; next time I'll be a little more careful in my choice of words". Such words are nice glossary candidates.
Examples from the past: Address, Domain.
Sometimes, though, It's not just different connotation
or meaning which leads to the long winding talks
without communication. These differences go down to
deeply held strong opinions.
Some differences in the use of words run much deeper than we can hope to clear up with just some definitions and warning signposts. They might help a little anyway, so these nastier entries are welcome, to.
Examples from the past: NULL, Flat.
Please post your proposal as copy & pastable text, with a subject line like this:
subject: cdt glossary [Identity]
Please also check spelling and grammar mistaeks.
Thank you for contributing.
Milestones? For the glossary I prefer inch-pebbles. Received on Wed Jul 26 2006 - 01:32:13 CEST