Re: relations aren't types?
Date: 26 Dec 2003 23:46:18 -0800
> My thinking on this issue has changed a bit. Now it strikes me that the
> important question is: what is an atomic type? It does not make
> sense to have a unary relation over an unnamed atomic type. But
> considering a unary relation over a non-atomic type to be logically
> identical to an n-ary relation over the named subcomponents
> of the type strikes me as quite a valuable idea. The fact that the
> subcomponents already have names makes it easy.
Given a unary relation over some type, we could certainly invoke an operator with a single scalar-valued parameter using a conversion, and perhaps the compiler could even make it implicit as a short-hand. We could also generalize the notion and allow tuples and relations to be used as arguments to operators with multiple parameters, with binding occurring by name. Another interesting possiblity would be to allow the binding to occur even if the attributes of the relation were a superset of the arguments to the operator, loosely speaking. None of these proposals requires that the concepts of relation and type be identical. On the contrary, it relies heavily on the fact that they are not. In fact, I would assert that your question could be more accurately phrased: What is the difference between a relation type and a scalar type. For me, the answer is the level of abstraction at which the user needs to deal with the data being modeled. In order to project over a relation, you need to know the names of the attributes on which it is defined. In order to add two DateTime values together, it is not important that DateTimes have a possible representation with multiple components year, month, day, etc. It is easier, from a programmatic standpoint, to deal with DateTimes as atomic values.
> It is clear that system-defined types like int, char, and float
> are atomic. What's not so clear is whether user-defined types
> will be atomic. TTM seems to assume that they will be; I
> claim it is better if they aren't.
> Put another way: I don't see the value to user-defined opaque
> types. Clearly the system has to have a few opaque types:
> int, float, relation are necessarily opaque.
Relation is not an opaque type like int and float. In order to deal with values of a given relation type, I need to know the attributes of the type, in general.
> But if one is
> interested in data management, then one should not have
> opacity: you can't manage what you can't see. Because of
> this, I propose that all user-defined types *must* be transparent
> and not opaque. If that's so, then it makes sense to consider
> a relation as a generic type that can be parameterized with
> a *single* user defined type, which specifies the named
> attributes of the relation.
What about types like DateTime and TimeSpan. Clearly there is value
in defining a type whose possible representation includes multiple
components, each with an arbitrary type. We could certainly define a
language which did not allow user-defined types and only allow the
user to define relation types over a pre-determined set of native
types (int, float, string, maybe even DateTime) but we take the
extremely presumptious position that we have correctly identified the
only possible native types of value to the user. I would prefer to
see a language that, like Tutorial D, allowed the user to define any
type of interest, and let them make the decision about what should be
modeled as a type, and what should be modeled as a relation variable.
As an example of a unary relation variable over a "non-atomic" type,
> of some negative consequence (such as a contradiction or
> an ambiguity) would be helpful.
I suggest that SQL stands a shining tribute to what happens when a language does not allow user-defined types.
> I'm less cautious. It strikes me that languages with support for generic
> programming have had substantial success, and I see no reason to
> believe their success won't transfer to a relation-centered language.
Not only do I agree, but I believe that there is a huge benefit to doing so. For example, I could define an event handler (trigger) which updated a column named LastUpdated to the current date and time. If the operator handling the event takes only a generic tuple as a parameter, the handler can be written generically and attached to any relation variable having at least an attribute named LastUpdated of type DateTime.
> I say: why bother with the conversion? Why not just declare the two
> cases as identical? I don't see any advantage to distinguishing between
> the two cases. (Maybe there is an implementation advantage.)
At least one advantage is brevity. This could certainly be claimed as an implementation advantage, but it is an important one. Indeed, I argued above that in order to realize this behavior, both concepts are necessary. While I do think that allowing implicit conversions of this type could be extremely useful in a database language, it must never be done at the expense of a more primitive behavior, and it must be done extremely carefully to avoid destroying the primary reason for types in the first place, namely semantic verification. Received on Sat Dec 27 2003 - 08:46:18 CET