# Re: a union is always a join!

Date: Fri, 13 Mar 2009 03:38:05 -0700 (PDT)

Message-ID: <d827ba26-7c5c-4c6a-9393-836ef7687e65_at_d2g2000pra.googlegroups.com>

Reinier,

*> On Feb 18, 12:14 pm, rp..._at_pcwin518.campus.tue.nl (rpost) wrote:
*

There are some fundamentals of the relational model that you also
don't

understand.

*> Yes, but what I meant to say is that in general, the tuples
**> don't really express facts regarding those domain values,
**> they just help express information about other domain values.
*

Tuples do not express facts about domain values. They contain domain values.

A relation has an associated "external predicate" which is a
parameterized

statement about the modelled world. Eg "My dog named NAME has N
fleas".

Eg "N cats have M kittens".

All the statements you get by substituting the values from the tuples
that are

in the relation for the corresponding parameters are true. All the
statements

you get by substituting the value from the tuples that are not in the
relation

but are typed by it are false. You do not have any choice about it. A
relation

makes a statement for every tuple that is typed by it, not just the
ones in

the relation.

*> this is true for *any* operation with negation,
**> e.g. RELOR or MINUS. It is only more conspicuous in NOT and OR,
**> because they introduce domain values not in the original relation(s).
*

Everything R says is true, NOT R says is false, and everything R says
is false

NOT R says is true. They both use every possible tuple that is typed
by the

relation. You just don't have to write down the false ones.

This is fundamental. So make sure every thing you ever say about
relations is

consistent with this.

*> When I look at OR it doesn't
**> look like an operation I would use. But this may just be a matter
**> of getting used to it.
*

If relation R's predicate is R and relation S's predicate is S then:

(R JOIN S)'s predicate is (R and S) (so it has the tuples for which
that's true).

Eg "My dog is named NAME and has N fleas and (the same number) N cats
have M kittens".

(R PROJECT A)'s predicate is (there exists an A such that R) (so it
has the

tuples for which that's true). Eg "There exists a NAME and an N such
that my

dog is named NAME and has N fleas" (ie "My dog has fleas").
(R MINUS T)'s predicate is (R and not T) (so it has the tuples for
which that's

true) (but they have to use the same attributes). Eg "My dog is named
**NAME
**

and has N fleas and it's not the case that my friend NAME ate N
apples".

(R UNION T)'s predicate is (R or T) (so it has the tuples for which
that's true)

(but they have to use the same attributes). Eg "My dog is named NAME
and

has N fleas or my friend NAME ate N apples". But if you wanted to put
'or'

between any predicates use (R OR S): Eg "My dog is named NAME and has
N

fleas or I have (the same number) N cats and M kittens".

*> I always liked how Codd's algebra separates 'horizontal'
**> operations (PROJECT, JOIN) from 'vertical' ones (MINUS, UNION,
**> INTERSECTION, quotient, selection).
*

But this is just coincidental. If (for instance) you PROJECT away
some

attributes, it is because you want a result that can be described via
"there

exists" as above. It doesn't matter that you didn't know it could be
described

that way, that you were unable to use that to actually write the right
thing,

and were instead organizing yourself around these other vague notions
instead.

*> >The benefit is that NOT and OR can give clearer queries even if you
**> >limit yourself to recastable queries
*

*> Perhaps.
*

*> >The basic consequence is that the difference between operators and
**> >relations becomes only syntactic.
**>
**> To the mathematician; not to the computer scientist.
*

Suppose I want the NAMES, N and M for which
R and not (S or T).

Why shouldn't I be able to write

** R AND NOT(S OR T)
**

instead of something with MINUS, UNION, LEFT JOIN, etc? You are
making

me be a mathematician instead of a computer scientist, and in a case
where the computer could have done the work.

Why can't I use a relation as a boolean-result-operator (ie predicate)
or as a

function-operator (called using a superkey) as in:
// Numbers of cats and their numbers of kittens
// with the same number of kittens as Frank has fleas.
S where R(Frank NAME, M as N)

S where N=R(Frank as Name)

or write

A join (PLUS where P1=5)

An operator is a relation.

*> Therefore I was tempted to read
**> 'the tuples typed by R' as the join of the single-attribute
**> projections of R. Let's call this operation RELDOM(R),
**> and define RELNOT(R) = RELDOM(R) MINUS R. RELDOM can
**> (obviously) be expressed in Codd's algebra, albeit not
**> with a fixed expression.
*

*> Yes; and we can again introduce a relative variant RELOR
**> by using RELDOM(R JOIN Q) instead of DOM(R JOIN Q).
*

You can define them but I don't know what you think they mean or could
be

useful for (I think you don't know what they mean, and thus couldn't
use

them for anything). NOT and OR have simple interpretations; these do
not.

Can you give me a reference for McGoveran vis a vis these?

*> It complicates the implementation: it is no longer possible to represent
**> all relations as finite collections of tuples.
*

*> But the implementation has to treat PLUS,
**> and mathematical operations in general, differently from the
**> finite, extensionally defined, time-varying relations that
**> constitute the actual information stored in a relational database.
*

*> Lazy evaluation doesn't help for NOT.
*

You are confused if you think the implementation or representation
has

anything to do with what the meaning of relations or queries are. The
user

knows what the values of the relations mean. They are implemented
somehow. It doesn't matter how. It does not matter to the user
whether

they would be big or infinite if written out.

If I allow you to use NOT and OR according to some policy, then you
can use

it according to that policy. One example policy I gave was, you can
use them

all you want eg to make updates but when you actually ask for the
tuples of a

relation (expression) it has to be one that could have been expressed
without

them. In practice every query is going to be able to be expressed
without them

because why would you have wanted one of these enormous relations?

The implementation is trivial. When the user assigns an expression and
it has

NOT or OR, just remember the expression instead of evaluating it. Use
in

other expressions. Evaluate when you have to. Object when you can't.
That's it.

We have not introduced infinite relations. All we have is relations
that would be

really big were we to enumerate their tuples, that we are not
enumerating. But

that is also why if we do have infinite relations, they're not a
problem eiher.

So don't confuse behaviour issues and implementation issues.

philip

On Feb 18, 12:14 pm, rp..._at_pcwin518.campus.tue.nl (rpost) wrote:

*> philip wrote:
**> >Reinier,
**>
**> >There are fundamentals of Date& Darwen's "algebra" A that you don't
**> >understand.
**>
**> I understood the definitions (thanks for repeating them though)
**> but I did make a major mistake regarding OR.
**>
**> >Definitions:
**> >AND is JOIN.
**> >NOT R has exactly the tuples typed by R that are not in R.
**>
**> Note that 'the tuples typed by R' is a ambiguous.
**>
**> I was going by Darwen's appendix "A New Relational Algebra",
**> which doesn't mention types. Therefore I was tempted to read
**> 'the tuples typed by R' as the join of the single-attribute
**> projections of R. Let's call this operation RELDOM(R),
**> and define RELNOT(R) = RELDOM(R) MINUS R. RELDOM can
**> (obviously) be expressed in Codd's algebra, albeit not
**> with a fixed expression.
**>
**> However, what you actually mean by 'the tuples typed by R' is
**> the join of the single-attribute relations with all possible
**> values for types of the attribute. This is also an operation,
**> let's call it DOM. Then we can define NOT(R) = DOM(R) MINUS R.
**> This 'absolute' NOT generally introduces domain values
**> that weren't in R, and when the domains are big or infinite,
**> produces very big / infinite relations. It's this NOT that
**> I have problems with.
**>
**> >R OR Q has exactly those tuples typed by R JOIN Q which
**> >give a tuple from R when projected on R's attributes or give a tuple
**> >from Q when projected on Q's attributes.
**>
**> Yes; and we can again introduce a relative variant RELOR
**> by using RELDOM(R JOIN Q) instead of DOM(R JOIN Q).
**>
**> >> Darwen's AND, OR and NOT are more general operations than
**> >> those of the relational algebra, but they are not any more precise.
**>
**> >Agree.
**>
**> >> And the price he pays for allowing them is worse than computational:
**> >> the tuples of the relations we create with them can no longer all
**> >> be regarded as expressing positive facts.
**>
**> >Not so.
**> >Queries are interpreted exactly the same as always.
**> >Tuples that are present are true and those that are not present are
**> >false.
**>
**> Yes, but what I meant to say is that in general, the tuples
**> don't really express facts regarding those domain values,
**> they just help express information about other domain values.
**>
**> But I wasn't thinking straight when I wrote that (as paul c
**> suggested): this is true for *any* operation with negation,
**> e.g. RELOR or MINUS. It is only more conspicuous in NOT and OR,
**> because they introduce domain values not in the original relation(s).
**>
**> >> We *lose* preciseness.
**>
**> >Not so.
**>
**> OK.
**>
**> >> Darwen's algebra is[...]
**> >> *not* implementation oriented, because NOT and OR cannot actually
**> >> be implemented as they are (try NOT-ing the table {A: 1, B: 1, C: 1},
**> >> where A, B, C can take arbitrary floating point values).
**>
**> >Of course they can be implemented.
**> >It's true that they denote large relations, but that's not necessarily
**> >a problem.
**>
**> It complicates the implementation: it is no longer possible to represent
**> all relations as finite collections of tuples.
**>
**> >First, you could allow only queries in NOT and OR that can be recast
**> >using MINUS and UNION.
**> >Second, you could allow non-query expressions but not queries in NOT
**> >and OR.
**>
**> Yes, that was my suggestion: rewrite to Codd's algebra
**> and reject the rest. Another idea is, of course, to use
**> RELNOT and RELOR instead.
**>
**> >Third, the cost of a query in NOT or OR is not necessarily
**> >unacceptable.
**> >It depends on the size of the attribute types rather than the number
**> >of tuples in the relation,
**>
**> Definitely. For Booleans or enums it's acceptable
**> to just complement literally.
**>
**> >but you could allow them and and still compute them (eg if the result
**> >is small enough) (or in a lazy evaluation context).
**>
**> Lazy evaluation doesn't help for NOT. I think you want to be smarter
**> anyway. But I notice there are several implementations already and I
**> haven't got a clue what they do.
**>
**> >The benefit is that NOT and OR can give clearer queries even if you
**> >limit yourself to recastable queries
**> >(they roughly mean "not" and "or" in the sense that
**> >roughly JOIN means "and", MINUS means (limited) "and not" and UNION
**> >means (limited) "or").
**>
**> Perhaps. I always liked how Codd's algebra separates 'horizontal'
**> operations (PROJECT, JOIN) from 'vertical' ones (MINUS, UNION,
**> INTERSECTION, quotient, selection). When I look at OR it doesn't
**> look like an operation I would use. But this may just be a matter
**> of getting used to it.
**>
**> >> I would
**> >> venture that the first thing any implementer will attempt to do
**> >> is rewrite all of Darwen's expressions back to relational algebra as
**> >> far as possible, and throw errors on the remainder.
**>
**> >You use "venture", "attempt" and "as far as possible" but the
**> >semantics are clear.
**>
**> True.
**>
**> [...]
**>
**> >You can either think of PLUS as a named relation value or as a
**> >variable that doesn't happen to change.
**> >Otherwise everything is as usual.
**>
**> Mathematically, yes. But the implementation has to treat PLUS,
**> and mathematical operations in general, differently from the
**> finite, extensionally defined, time-varying relations that
**> constitute the actual information stored in a relational database.
**>
**> [...]
**>
**> >In your informal terms, PLUS is information about the world; that info
**> >just doesn't happen to change.
**>
**> No, it is not. The implementation can, and had better, take advantage
**> of the invariability of PLUS, and the typical computer hardware's
**> buil-in support for it. You really really don't want to implement it
**> as table lookup. Even for e.g. Boolean operators you probably want to
**> optimize differently than you they will be optimized when implemented
**> as table lookup.
**>
**> >> Nice as an academic exercise (look, ma, I removed a concept)
**> >> but not as an academic result (no useful purpose).
**>
**> >It is practical and not an exercise to simplify and generalize.
**> >The basic consequence is that the difference between operators and
**> >relations becomes only syntactic.
**>
**> To the mathematician; not to the computer scientist.
**>
**> >You can use a relation wherever you can use an operator and vice versa.
**> >This can be extended to include user-defined operations for which, again,
**> >the complexity for recastable queries is the same as the recast query,
**> >and you need a policy for non-recastable queries (and further policies
**> >for side-effects).
**> >If the D&D presentation were more formal and complete this would be
**> >more evident.
**>
**> It is evident already, but I think this policy stuff is hard.
**>
**> >BTW, using NOT and OR is equivalent to using MINUS and UNION along
**> >with a named single-attribute relation value
**> >for each type each containing all that type's values.
**>
**> Yes (DOM above).
**>
**> >Such a database has the same operation complexity as usual but some of
**> >the leaves (the type-relations) are big.
**> >So the cost of evaluating corresponding queries is the same.
**> >Again, it is notationally more elegant and exactly as computationally
**> >expensive
**> >to allow the latter with the same recast-restrictions you would apply
**> >to the former.
**>
**> As I already said in my previous message, I see the elegance,
**> but at the same time I find elegance in Codd's use of two kinds
**> of operations. Obviously if you rewrite to Codd's algebra (or do
**> something equivalent to that) you're not going to lose efficiency
**> on the cases where this is possible. But just rejecting the rest
**> is only going to confuse the user. Why not be honest and admit
**> that we really expect, in our implementations, relational queries
**> to work on finite, typically small relations, that WORKS_IN is one
**> and PLUS is not, and that NOT doesn't do so?
**>
**> >philip
**>
**> --
**> Reinier
*

Received on Fri Mar 13 2009 - 11:38:05 CET