Re: RA with MV attributes

From: David <davidbl_at_iinet.net.au>
Date: 17 Jan 2007 02:45:13 -0800
Message-ID: <1169030713.900870.134270_at_a75g2000cwd.googlegroups.com>


Marshall wrote:
> On Jan 16, 6:51 pm, "David" <davi..._at_iinet.net.au> wrote:
> > Marshall wrote:
> > > On Jan 16, 4:48 pm, "David" <davi..._at_iinet.net.au> wrote:
> >
> > > Furthermore you will find that this property holds true of many
> > > functions over generic types. Consider a simple generic list
> > > type supporting a "get first element" operation. The type of
> > > the generic list is parameterized by the element type T.
> > > The type of getFirstElement() is int -> T. Hence the getFirstElement
> > > operation could also be said to "calculate the type" but if you
> > > code this up in, say, C++, the type calculating occurs entirely
> > > at compile time.
> >
> > > > To avoid the infinite regress you should say
> > > > 1. A relation value r comprises both A(r) and T(r).
> > > > 2. The *type* of all relation values is the same: ie "relation"
> > > > 3. T(r) on its own is not a relation value
> > > > 4. The type of T(r) is "set of tuples", not "relation"
> >
> > > > BTW saying "r is a relation value" is naff. Really should say "r
> > > > is a relation".
> >
> > > That doesn't work for me. In 2, your proposal only considers
> > > abstract, parameterized type "relation" and doesn't consider
> > > concrete relation types.
> >
> > 1 & 2 are saying *by definition* there is a type called "relation".
>
> Right.
>
>
> > It is neither abstract nor parameterised.
>
> Uh, no. It absolutely is abstract, because there is no value whose
> most specific type (or "concrete type" if you prefer) is simply
> "relation."

No that's not what I mean. Consider a C++ program that defines a concrete class called Relation that stores the set of attributes and the tuples. This is a concrete non-parameterized class.

We could write

void foobar()
{

    Relation r; // An empty relation with no attributes or tuples     r.AddAttribute("x", "double");
}

> Relation types are parameterized by a set of attributes.
>
> By definition.
>
>
> > Why can't we say that *by definition* the set of attributes of a
> > relation is part of its state not its type?
>
> We certainly *can*; the question is whether we *should.* And I'm
> prepared to argue that we shouldn't, because we lose useful
> properties and useful distinctions.
>
>
> > Here's an analogy. A C++ library provides a matrix class that is not
> > parameterised on the number of rows and columns. You can write this
> >
> > void foo(int n,int m)
> > {
> > Matrix A(n,3), B(3,m);
> > Matrix C = A * B; // Size = n x m
> > }
> >
>
> Well, what is it a matrix *of*? Something goes in the cells, and that
> something has a type. If your matrix class only supports a single
> fixed cell type, then it is not a parameterized type (and it is not a
> good analog for relations, unless we mean relations of only a
> single fixed type.) If it supports arbitrary cell types, then it is
> parameterized by the cell type.
>
> You know:
>
> new Matrix<int>(3, 3);
>
> Note the syntax: the angle brackets are parameter passing at compile
> time; the parentheses are parameter passing at runtime.
>
>
> > The non-parameterised type also allows for mutative operations that add
> > or remove rows and columns.
>
> Sure; some matrix/array/whatever types allow this, *if and only if* the
> width
> and height of the matrix is not part of the type. In C++, for example,
> it
> is possible to go either way; you can have a matrix type that takes a
> cell
> type, a width, and a height at compile time, and complains if you
> attempt
> a missized matrix multiplication.
>
> new Matrix<int, 3, 3>();
>
> Such a type can support a higher degree of correctness checking at
> compile time, but cannot support modification of the type parameters
> (such as width and height) at runtime. It is a tradeoff.

Agreed

> > > In 4, you draw a distinction between
> > > a set of tuples and a relation, but those are equivalent as far
> > > as I know. Can you explain the distinction?
> > 4 follows from 1 which is a definition of the type "relation". My
> > point is that it is self consistent.
>
> I agree it is consistent.
>
>
> > Consider this: In mathematics a function is defined on a particular
> > domain and codomain. Restrict the domain and you have a *different*
> > function. Otherwise we can't make statements about whether a given
> > function is 1:1 or onto. This suggests that domain and codomain
> > should be regarded as *properties* of a given function.
>
> The choice of functions for you example is a poor one, because
> functions
> are considered immutable constant values in virtually every programming
> language. And immutable constant values do not have any distinction
> in value between compile time and runtime. Any operation you care to
> do on them can be done in either phase. The distinction I'm proposing
> is most relevant when considering the difference between the constant
> *type* of an expression and its varying-from-run-to-run value.

I'm not sure...

Out of interest, do you think it is reasonable to write domain(f) for the domain of function f, or would you say that is evil somehow?

> > A similar
> > thing seems to be true for a relation - it is more than its set of
> > tuples. Note in any case that I defined a tuple to be a mapping on
> > A(r), so therefore you can't really disconnect the tuple from the set
> > of attributes anyway.
>
> I'm not proposing "disconnecting" them per se; I'm trying to emphasize
> that type and value may exist at different phases in computation. They
> remain intimately connected the same way that "3" and "int" are
> connected.
> But I would not say that "3" is part of the type nor that "int" is part
> of
> the value.
>
> Types may disappear (or not) at runtime; values do not exist at
> compile time. If we remove this distinction, we remove valuable
> capabilities from our language.

I'm certainly not arguing against parameterised types in general, just as I'm sure you are not advocating that types always be fully parameterised.

I also think we both have a pretty good understanding of the tradeoffs.

> A simple example:
>
> Consider the idea of join compatibility. This is violated if we attempt
> to join two relations each having an attribute of a given name but a
> different type in each relation. Now, whatever your feelings
> about how this should be handled, warning, error, whatever, we have
> to consider the question of when and whether this situation can be
> detected.
>
> Under a system in which we consider the attributes as part of the
> static type, and not part of the value, the presence or absence of
> join compatibility in a given query can be established by looking
> *only at the query* and not at the value of any of the operands.
> In other words, if the query is included in some source code, we
> can analyze the source code *without ever running it* and determine
> exactly the status of join compatibility. (Let me know if this needs
> further explanation.)
>
> Under a system in which we consider the attributes as part of the
> value, and under which we have a single universal type "relation";
> we *cannot* determine this ahead of time. Join compatibility checking
> *has to* be deferred until runtime, and it has to be checked
> every time a join occurs.

Agreed

> > > > I presume in a proper general purpose system strong typing of relvars
> > > > won't work. Having to specify the type of relvars would be painful.
> >
> > > Well, the term "strong typing" is fraught with peril, but assuming
> > > you mean "static typing" then I don't see why you think this
> > > would be an issue. Consider that a CREATE TABLE statement
> > > specifies the type of relvars.
> >
> > > > It would also disallow conditional projection, such as
> >
> > > > relation r = test ? r1 : proj(r1,B)
> >
> > > > ...not that I have an example of where that would be useful.
> >
> > > I think you will find that *every* language with a ? : operator
> > > does not allow the two possible results to have incompatible types.
> >
> > That's my point. That's why it seems better to have a
> > non-parameterized type called "relation".
>
> It absolutely is not "better." It is different. You gain some things
> and you lose some things. We could only consider it better if
> we a priori exclude some considerations and focus only on a
> subset.

I agree it won't be better in any absolute sense, because it depends on the situation.

However, I imagine in practice I would prefer the non-parameterised type for relations for most applications. One reason stems from the fact that RA expressions influence the type so having to specify the result type would be too painful.

> It is certainly entirely possible to have a dynamic relational
> language.
> It would be able to express more algorithms: those that are safe
> but not provably safe. It would not be well-suited to static analysis;
> we could not prove the absence of join compatibility for example.
>
> Different. A tradeoff.
>
>
> Marshall
>
> PS. As this thread develops, it begins to appear that the question of
> whether the header is part of the values is almost the same question
> as whether we have a static or a dynamic type system

In a sense. However note that in C++ (despite its static type system) I would normally avoid parameterisation of relation types because every distinct relation type would have to be compiled. Even worse a join operation would be parameterised on the type of both relations it joins on. Of course you can avoid the binary bloat by factoring out a weakly typed implementation from the templates. Another consideration is whether the proliferation of instantiations leads to better performance. However this would only be practical in a system that has completely fixed schema, known at compile time. It doesn't appear satisfactory for a general purpose RDBMS. Received on Wed Jan 17 2007 - 11:45:13 CET

Original text of this message