Re: RA with MV attributes

From: Marshall <marshall.spight_at_gmail.com>
Date: 17 Jan 2007 01:07:00 -0800
Message-ID: <1169024820.624939.35370_at_v45g2000cwv.googlegroups.com>


On Jan 16, 6:51 pm, "David" <davi..._at_iinet.net.au> wrote:
> Marshall wrote:
> > On Jan 16, 4:48 pm, "David" <davi..._at_iinet.net.au> wrote:
>
> > Furthermore you will find that this property holds true of many
> > functions over generic types. Consider a simple generic list
> > type supporting a "get first element" operation. The type of
> > the generic list is parameterized by the element type T.
> > The type of getFirstElement() is int -> T. Hence the getFirstElement
> > operation could also be said to "calculate the type" but if you
> > code this up in, say, C++, the type calculating occurs entirely
> > at compile time.
>
> > > To avoid the infinite regress you should say
> > > 1. A relation value r comprises both A(r) and T(r).
> > > 2. The *type* of all relation values is the same: ie "relation"
> > > 3. T(r) on its own is not a relation value
> > > 4. The type of T(r) is "set of tuples", not "relation"
>
> > > BTW saying "r is a relation value" is naff. Really should say "r
> > > is a relation".
>
> > That doesn't work for me. In 2, your proposal only considers
> > abstract, parameterized type "relation" and doesn't consider
> > concrete relation types.
>
> 1 & 2 are saying *by definition* there is a type called "relation".

Right.

> It is neither abstract nor parameterised.

Uh, no. It absolutely is abstract, because there is no value whose most specific type (or "concrete type" if you prefer) is simply "relation."
Relation types are parameterized by a set of attributes.

By definition.

> Why can't we say that *by definition* the set of attributes of a
> relation is part of its state not its type?

We certainly *can*; the question is whether we *should.* And I'm prepared to argue that we shouldn't, because we lose useful properties and useful distinctions.

> Here's an analogy. A C++ library provides a matrix class that is not
> parameterised on the number of rows and columns. You can write this
>
> void foo(int n,int m)
> {
> Matrix A(n,3), B(3,m);
> Matrix C = A * B; // Size = n x m
> }
>

Well, what is it a matrix *of*? Something goes in the cells, and that something has a type. If your matrix class only supports a single fixed cell type, then it is not a parameterized type (and it is not a good analog for relations, unless we mean relations of only a single fixed type.) If it supports arbitrary cell types, then it is parameterized by the cell type.

You know:

  new Matrix<int>(3, 3);

Note the syntax: the angle brackets are parameter passing at compile time; the parentheses are parameter passing at runtime.

> The non-parameterised type also allows for mutative operations that add
> or remove rows and columns.

Sure; some matrix/array/whatever types allow this, *if and only if* the width
and height of the matrix is not part of the type. In C++, for example, it
is possible to go either way; you can have a matrix type that takes a cell
type, a width, and a height at compile time, and complains if you attempt
a missized matrix multiplication.

  new Matrix<int, 3, 3>();

Such a type can support a higher degree of correctness checking at compile time, but cannot support modification of the type parameters (such as width and height) at runtime. It is a tradeoff.

> > In 4, you draw a distinction between
> > a set of tuples and a relation, but those are equivalent as far
> > as I know. Can you explain the distinction?
> 4 follows from 1 which is a definition of the type "relation". My
> point is that it is self consistent.

I agree it is consistent.

> Consider this: In mathematics a function is defined on a particular
> domain and codomain. Restrict the domain and you have a *different*
> function. Otherwise we can't make statements about whether a given
> function is 1:1 or onto. This suggests that domain and codomain
> should be regarded as *properties* of a given function.

The choice of functions for you example is a poor one, because functions
are considered immutable constant values in virtually every programming language. And immutable constant values do not have any distinction in value between compile time and runtime. Any operation you care to do on them can be done in either phase. The distinction I'm proposing is most relevant when considering the difference between the constant *type* of an expression and its varying-from-run-to-run value.

> A similar
> thing seems to be true for a relation - it is more than its set of
> tuples. Note in any case that I defined a tuple to be a mapping on
> A(r), so therefore you can't really disconnect the tuple from the set
> of attributes anyway.

I'm not proposing "disconnecting" them per se; I'm trying to emphasize that type and value may exist at different phases in computation. They remain intimately connected the same way that "3" and "int" are connected.
But I would not say that "3" is part of the type nor that "int" is part of

the value.

Types may disappear (or not) at runtime; values do not exist at compile time. If we remove this distinction, we remove valuable capabilities from our language.

A simple example:

Consider the idea of join compatibility. This is violated if we attempt to join two relations each having an attribute of a given name but a different type in each relation. Now, whatever your feelings about how this should be handled, warning, error, whatever, we have to consider the question of when and whether this situation can be detected.

Under a system in which we consider the attributes as part of the static type, and not part of the value, the presence or absence of join compatibility in a given query can be established by looking *only at the query* and not at the value of any of the operands. In other words, if the query is included in some source code, we can analyze the source code *without ever running it* and determine exactly the status of join compatibility. (Let me know if this needs further explanation.)

Under a system in which we consider the attributes as part of the value, and under which we have a single universal type "relation"; we *cannot* determine this ahead of time. Join compatibility checking *has to* be deferred until runtime, and it has to be checked every time a join occurs.

> > > I presume in a proper general purpose system strong typing of relvars
> > > won't work. Having to specify the type of relvars would be painful.
>
> > Well, the term "strong typing" is fraught with peril, but assuming
> > you mean "static typing" then I don't see why you think this
> > would be an issue. Consider that a CREATE TABLE statement
> > specifies the type of relvars.
>
> > > It would also disallow conditional projection, such as
>
> > > relation r = test ? r1 : proj(r1,B)
>
> > > ...not that I have an example of where that would be useful.
>
> > I think you will find that *every* language with a ? : operator
> > does not allow the two possible results to have incompatible types.
>
> That's my point. That's why it seems better to have a
> non-parameterized type called "relation".

It absolutely is not "better." It is different. You gain some things and you lose some things. We could only consider it better if we a priori exclude some considerations and focus only on a subset.

It is certainly entirely possible to have a dynamic relational language.
It would be able to express more algorithms: those that are safe but not provably safe. It would not be well-suited to static analysis; we could not prove the absence of join compatibility for example.

Different. A tradeoff.

Marshall

PS. As this thread develops, it begins to appear that the question of whether the header is part of the values is almost the same question as whether we have a static or a dynamic type system. Received on Wed Jan 17 2007 - 10:07:00 CET

Original text of this message