Re: a union is always a join!

From: <compdb_at_hotmail.com>
Date: Thu, 29 Jan 2009 14:05:55 -0800 (PST)
Message-ID: <4402d922-195a-44f5-a0e2-94efb514a037_at_p23g2000prp.googlegroups.com>



On Jan 8, 11:21 am, rp..._at_pcwin518.campus.tue.nl (rpost) wrote:

Reinier,

There are fundamentals of Date& Darwen's "algebra" A that you don't understand.

Definitions:
AND is JOIN.
NOT R has exactly the tuples typed by R that are not in R. R OR Q has exactly those tuples typed by R JOIN Q which give a tuple from R when projected on R's attributes or give a tuple from Q when projected on Q's attributes. R MINUS (for permitted R and Q) is R AND NOT Q. R MINUS (for permitted R and Q) Q is also R OR Q.

> Darwen's AND, OR and NOT are more general operations than
> those of the relational algebra, but they are not any more precise.

Agree.

> And the price he pays for allowing them is worse than computational:
> the tuples of the relations we create with them can no longer all
> be regarded as expressing positive facts.

Not so.
Queries are interpreted exactly the same as always. Tuples that are present are true and those that are not present are false.

>  We now have to know how a
> resulting relation was produced before we can interpret it.
> We *lose* preciseness.

Not so.
Interpretation is the same as always.
I don't know what you think the problem is. Try to give an example.

> Darwen's algebra isn't any more axiomatic, it is *less* precise,
...
> and *not* implementation oriented, because NOT and OR cannot actually
> be implemented as they are (try NOT-ing the table {A: 1, B: 1, C: 1},
> where A, B, C can take arbitrary floating point values).

Of course they can be implemented.
It's true that they denote large relations, but that's not necessarily a problem.
First, you could allow only queries in NOT and OR that can be recast using MINUS and UNION.
Second, you could allow non-query expressions but not queries in NOT and OR.
Third, the cost of a query in NOT or OR is not necessarily unacceptable.
It depends on the size of the attribute types rather than the number of tuples in the relation,
but you could allow them and and still compute them (eg if the result is small enough) (or in a lazy evaluation context). The benefit is that NOT and OR can give clearer queries even if you limit yourself to recastable queries

(they roughly mean "not" and "or" in the sense that
roughly JOIN means "and",  MINUS means (limited) "and not" and UNION
means (limited) "or").

> I would
> venture that the first thing any implementer will attempt to do
> is rewrite all of Darwen's expressions back to relational algebra as
> far as possible, and throw errors on the remainder.

You use "venture", "attempt" and "as far as possible" but the semantics are clear.

>  Then again, there
> is no doubt a lot of theory on working with partly disjunctively or
> negatively interpreted relations that I'm not aware of.

Expressions in terms of NOT and OR are not interpreted differently. Expressions that cannot be recast using MINUS and UNION instead of NOT and OR
just might contain a lot of tuples.

> The same objection holds for treating selection as join with mathematical
> relations (in Darwen's section 'Treating operators as relations'.)
> A relation in a relational database is of a very different nature
> than a mathematical relation such as PLUS.  A database relation
> consists of a posteriori knowledge, information about the world.
> A mathematical relation is a priori: it reflects information
> about mathematical constructs, not information about the world.

Not so.
You can either think of PLUS as a named relation value or as a variable that doesn't happen to change.
Otherwise everything is as usual.
Eg R add C as A-B means R COMPOSE (PLUS rename ARG1 as C, ARG2 as B, RESULT as A)
(you need a parameter naming convention if you want to mix named with positional parameters).
In your informal terms, PLUS is information about the world; that info just doesn't happen to change.

> Nice as an academic exercise (look, ma, I removed a concept)
> but not as an academic result (no useful purpose).

It is practical and not an exercise to simplify and generalize. The basic consequence is that the difference between operators and relations becomes only syntactic.
You can use a relation wherever you can use an operator and vice versa.
This can be extended to include user-defined operations for which, again,
the complexity for recastable queries is the same as the recast query, and you need a policy for non-recastable queries (and further policies for side-effects).
If the D&D presentation were more formal and complete this would be more evident.

BTW, using NOT and OR is equivalent to using MINUS and UNION along with a named single-attribute relation value for each type each containing all that type's values. Such a database has the same operation complexity as usual but some of the leaves (the type-relations) are big. So the cost of evaluating corresponding queries is the same. Again, it is notationally more elegant and exactly as computationally expensive
to allow the latter with the same recast-restrictions you would apply to the former.

philip Received on Thu Jan 29 2009 - 16:05:55 CST

Original text of this message