Re: On the usefulness of tables definitions in RM...

From: paul c <toledobythesea_at_oohay.ac>
Date: Fri, 20 Aug 2010 15:56:47 GMT
Message-ID: <3Bxbo.152$89.81_at_edtnps83>


On 20/08/2010 6:47 AM, Cimode wrote:
> I lately came to the conclusion that teaching relation structures and
> manipulations by using tables inherently induces a bias to think of
> relations as relation values as opposed to relation variables.
> However, I believe that tables are elements of the presentation layer
> since they are only *one* possible representation in time of a
> specific relation.
>
> I am curious on whether this confuses more than it helps as far as
> operation definitions are concerned. What representations are to be
> preferred to avoid confusions ? In what context ?
>
>
> Opinions welcome.

Some oblique notes:

First, Codd:

1969 - "For expository reasons, we shall frequently make use of an array representation of relations, but it must be remembered that this particular representation is not an essential part of the relational view being expounded." No mention of "tables" AFAICT.

1970 - "The provision of data description tables in recently developed information systems represents a major advance toward the goal of data independence [5,6,7]. Such tables facilitate changing certain characteristics of the data representation stored in a data bank." He's not talking about relation storage here, but I can imagine some people taking it that way.

1970 - "The simplicity of the array representation which becomes feasible when all relations are cast in normal form is not only an advantage for storage purposes but also for communication of bulk data between systems which use widely different representations of the data." Here, he was talking of two-dimensional arrays.

1979 - "A relation then consists of a set of tuples, each tuple having the same set of
attributes. If the domains are all simple, such a relation has a tabular representation
with the following properties.

(1) There is no duplication of rows (tuples).
(2) Row order is insignificant.
(3) Column (attribute) order is insignificant.
(4) All table entries are atomic values."

 From what I've seen, it was 1979 before he talked of 'tabular representation'.

1990 - "1.5 • Tables versus Relations
Actually, the terms "relation" and "table" are not synonymous. As discussed earlier, the concept of a relation found in mathematics and in the relational model is that of a special kind of set. The relations of the relational model, although they may be conceived as tables, are then special kinds of tables.
"In this book they are called R-tables, although the term "relation" is still used from time to time to emphasize the underlying concept of mathematical sets, to refer to the model, or to refer to languages developed as part of implementations of the model. "R-tables have no positional concepts. One may shuffle the rows without affecting information content. Thus, there is no nextness of rows. Similarly, one may shuffle the columns without affecting information content, providing the column heading is taken with each column. Thus, there is no "nextness" of columns.
"Normally, neither of these shuffling activities can be applied with such immunity to arrays. That is why I consider it extremely misleading to use the term "array" to describe the structuring of data in the relational model."

1979 - "The fact that relations can be perceived as tables, and that tables are similar to flat files, breeds the false assumption that the freedom of action permitted [with] tables or flat files must also be permitted when manipulating relations. The manipulation differences are quite strong." Note that this was written after the first well-known implementations, such as System-R at IBM.

This is what Chamberlin and Boyce wrote about SEQUEL in 1974, well before what Codd wrote in 1979:

1974 - "SEQUEL emphasizes simple data structures and operations. In a series of papers, E. F. Codd (5-9) has introduced the relational Model of data, which appears to be the simplest possible general-purpose data structure, and which provides a maxinun degree of data independence. In this paper we deal only with normalized relations, which can be viewed as tables of n columns and a varying number of rows..."

I can imagine that the SEQUEL quote above was more responsible for legitimizing, in some people's eyes, the table representation than anything Codd wrote. Codd had mentioned "named relations" in his first two papers and it's not hard to see how people came to associate those with 'named tables'. In 1970, Codd wrote: "A general name would take a form such as R (g).r.d where R is a relational name; g is a generation identifier (optional); r is a role name (optional); d is a domain name."   I imagine he might have been influenced to associate 'generations' with 'names' by the IBM file systems of the day which had a feature called 'generation data groups'.

To give a partial opinion, it seems that the obvious attractiveness of 'named tables' as a storage mechanism have distorted some of Codd's original intention. Some examples I can think of are:

i) pre-existing file systems must have seemed an easy way to store 'stored tables', if you will. So one can imagine the next step/mis-step, associating a file with a table with a relation.

ii) once people started thinking in terms of 'stored tables', instead of 'named relations' the stage was set for the invention of 'relation variables' which must have seemed attractive for adapting pre-existing procedural languages to db manipulation.

Just to add a pet gripe (or maybe confusion) of my own - personally I think another distortion has to do with the projection or existential operator. Even to a non-logician like me, it seems patent that the operator demands two operands. But the fixed-dimension array or table leads us to ignore one of them. Maybe this is because they encourage us to forget about the structure of tuples. Hugh Darwen has an interesting paper on his site about 'multi-relations'. Apologies to him if I mis-read it, but it looked to me that the distortion continues by rolling the attributes that are unmentioned by projection together, eg., to use a commonly known relation, projecting Suppliers from SuppliersParts usually leaves out mention of Parts. This effectively rolls Parts together with table_DEE. Whereas if a 'Darwenian' multi-relation were used to record both Suppliers and SupplierParts (or possibly even Parts), I think all three relations could be extracted/manipulated with a single 'structure'. Received on Fri Aug 20 2010 - 17:56:47 CEST

Original text of this message