Re: a union is always a join!

From: rpost <rpost_at_pcwin518.campus.tue.nl>
Date: Wed, 18 Feb 2009 19:14:07 +0000 (UTC)
Message-ID: <gnhmlv$61$1_at_mud.stack.nl>


philip wrote:

>Reinier,
>
>There are fundamentals of Date& Darwen's "algebra" A that you don't
>understand.

I understood the definitions (thanks for repeating them though) but I did make a major mistake regarding OR.

>Definitions:
>AND is JOIN.
>NOT R has exactly the tuples typed by R that are not in R.

Note that 'the tuples typed by R' is a ambiguous.

I was going by Darwen's appendix "A New Relational Algebra", which doesn't mention types. Therefore I was tempted to read 'the tuples typed by R' as the join of the single-attribute projections of R. Let's call this operation RELDOM(R), and define RELNOT(R) = RELDOM(R) MINUS R. RELDOM can (obviously) be expressed in Codd's algebra, albeit not with a fixed expression.

However, what you actually mean by 'the tuples typed by R' is the join of the single-attribute relations with all possible values for types of the attribute. This is also an operation, let's call it DOM. Then we can define NOT(R) = DOM(R) MINUS R. This 'absolute' NOT generally introduces domain values that weren't in R, and when the domains are big or infinite, produces very big / infinite relations. It's this NOT that I have problems with.

>R OR Q has exactly those tuples typed by R JOIN Q which
>give a tuple from R when projected on R's attributes or give a tuple
>from Q when projected on Q's attributes.

Yes; and we can again introduce a relative variant RELOR by using RELDOM(R JOIN Q) instead of DOM(R JOIN Q).

>> Darwen's AND, OR and NOT are more general operations than
>> those of the relational algebra, but they are not any more precise.
>
>Agree.
>
>> And the price he pays for allowing them is worse than computational:
>> the tuples of the relations we create with them can no longer all
>> be regarded as expressing positive facts.
>
>Not so.
>Queries are interpreted exactly the same as always.
>Tuples that are present are true and those that are not present are
>false.

Yes, but what I meant to say is that in general, the tuples don't really express facts regarding those domain values, they just help express information about other domain values.

But I wasn't thinking straight when I wrote that (as paul c suggested): this is true for *any* operation with negation, e.g. RELOR or MINUS. It is only more conspicuous in NOT and OR, because they introduce domain values not in the original relation(s).

>> We *lose* preciseness.
>
>Not so.

OK.

>> Darwen's algebra is[...]
>> *not* implementation oriented, because NOT and OR cannot actually
>> be implemented as they are (try NOT-ing the table {A: 1, B: 1, C: 1},
>> where A, B, C can take arbitrary floating point values).
>
>Of course they can be implemented.
>It's true that they denote large relations, but that's not necessarily
>a problem.

It complicates the implementation: it is no longer possible to represent all relations as finite collections of tuples.

>First, you could allow only queries in NOT and OR that can be recast
>using MINUS and UNION.
>Second, you could allow non-query expressions but not queries in NOT
>and OR.

Yes, that was my suggestion: rewrite to Codd's algebra and reject the rest. Another idea is, of course, to use RELNOT and RELOR instead.

>Third, the cost of a query in NOT or OR is not necessarily
>unacceptable.
>It depends on the size of the attribute types rather than the number
>of tuples in the relation,

Definitely. For Booleans or enums it's acceptable to just complement literally.

>but you could allow them and and still compute them (eg if the result
>is small enough) (or in a lazy evaluation context).

Lazy evaluation doesn't help for NOT. I think you want to be smarter anyway. But I notice there are several implementations already and I haven't got a clue what they do.

>The benefit is that NOT and OR can give clearer queries even if you
>limit yourself to recastable queries
>(they roughly mean "not" and "or" in the sense that
>roughly JOIN means "and", MINUS means (limited) "and not" and UNION
>means (limited) "or").

Perhaps. I always liked how Codd's algebra separates 'horizontal' operations (PROJECT, JOIN) from 'vertical' ones (MINUS, UNION, INTERSECTION, quotient, selection). When I look at OR it doesn't look like an operation I would use. But this may just be a matter of getting used to it.

>> I would
>> venture that the first thing any implementer will attempt to do
>> is rewrite all of Darwen's expressions back to relational algebra as
>> far as possible, and throw errors on the remainder.
>
>You use "venture", "attempt" and "as far as possible" but the
>semantics are clear.

True.

[...]

>You can either think of PLUS as a named relation value or as a
>variable that doesn't happen to change.
>Otherwise everything is as usual.

Mathematically, yes. But the implementation has to treat PLUS, and mathematical operations in general, differently from the finite, extensionally defined, time-varying relations that constitute the actual information stored in a relational database.

[...]

>In your informal terms, PLUS is information about the world; that info
>just doesn't happen to change.

No, it is not. The implementation can, and had better, take advantage of the invariability of PLUS, and the typical computer hardware's buil-in support for it. You really really don't want to implement it as table lookup. Even for e.g. Boolean operators you probably want to optimize differently than you they will be optimized when implemented as table lookup.

>> Nice as an academic exercise (look, ma, I removed a concept)
>> but not as an academic result (no useful purpose).
>
>It is practical and not an exercise to simplify and generalize.
>The basic consequence is that the difference between operators and
>relations becomes only syntactic.

To the mathematician; not to the computer scientist.

>You can use a relation wherever you can use an operator and vice versa.
>This can be extended to include user-defined operations for which, again,
>the complexity for recastable queries is the same as the recast query,
>and you need a policy for non-recastable queries (and further policies
>for side-effects).
>If the D&D presentation were more formal and complete this would be
>more evident.

It is evident already, but I think this policy stuff is hard.

>BTW, using NOT and OR is equivalent to using MINUS and UNION along
>with a named single-attribute relation value
>for each type each containing all that type's values.

Yes (DOM above).

>Such a database has the same operation complexity as usual but some of
>the leaves (the type-relations) are big.
>So the cost of evaluating corresponding queries is the same.
>Again, it is notationally more elegant and exactly as computationally
>expensive
>to allow the latter with the same recast-restrictions you would apply
>to the former.

As I already said in my previous message, I see the elegance, but at the same time I find elegance in Codd's use of two kinds of operations. Obviously if you rewrite to Codd's algebra (or do something equivalent to that) you're not going to lose efficiency on the cases where this is possible. But just rejecting the rest is only going to confuse the user. Why not be honest and admit that we really expect, in our implementations, relational queries to work on finite, typically small relations, that WORKS_IN is one and PLUS is not, and that NOT doesn't do so?

>philip

-- 
Reinier
Received on Wed Feb 18 2009 - 20:14:07 CET

Original text of this message