Re: a union is always a join!

From: rpost <rpost_at_pcwin518.campus.tue.nl>
Date: Thu, 8 Jan 2009 19:21:59 +0000 (UTC)
Message-ID: <gk5jon\$22sv\$1_at_mud.stack.nl>

paul c wrote:

>rpost wrote:
>> paul c wrote:
>>
>>> Criticisms, please (preferably ones based on some formal logic or other).
>>>
>>> It is inescapable that every relation is a join (eg., Heath's theorem).
>>
>> As Brian writes, this is one step too far. Every relation can be
>> decomposed into smaller ones based on its nontrivial functional
>> dependencies, but if it hasn't any - and the resulting relations don't -
>> it is only a join of itself.
>> ...
>
>All relations have at least n+1 fd's, even if trivial, where n is the
>number of attributes. All relations with at least one attribute (all
>but 'Dee') are a join of at least two (non-empty) relations.

Yes, trivial ones: one is the relation itself. Nontrivial ones exist only when the relation has (nontrivial) join dependencies, e.g. a (nontrivial) key.

>> Perhaps by join you mean something like the AND in Darwen's 'A New
>> Relational Algebra' (thanks for giving its URL earlier), which has the
>> join and and union of relational algebra as special cases. We can also
>> decompose relations by writing them as the unions of other relations
>> (someone I know wrote a PhD thesis about it), but Heath isn't about that.
>> ...
>
><AND> or the equivalent in another algebra is the only way to understand
>what join and union really mean, ie., are actually capable of being
>interpreted as.

I don't understand this, they are orthogonal operations, as I explained.

>Some 'unions' can only be expressed as the union of
>themselves and an empty relation.

I don't understand your interest in trivial unions or trivial joins.

>I think it is helpful to understand
>union from some formal definition that acknowledges the logical
>possibility that two relations may not be 'union' compatible, eg., the
>'<OR>' operation or equivalent.

The relational algebra union does so by not being defined for those cases, but I agree with your and Darwen's sentiment that it is nicer to extend operations to cover all cases when this can be usefully done, in fact I've worked on a similar generalization of operations for a different query language. But this doesn't change the fact that union and join are fundamentally orthogonal, in the sense that one combines relations vertically, the other horizontally.

So I do sympathize with his OR and NOT; but I also feel the problems with them are more than 'computational issues'.

>Understanding relation structure based
>on empty relations seems as pointless/useless to me as understanding
>relational complement as being 'simple' complement. In fact, I'd rather
>call simple complement the vague complement and relative complement the
>exact complement. Why eschew a more precise interpretation when it's
>staring us in the face?

I'm trying to make sense of this by reading Darwen's NOT for 'relational complement' and relational algebra's MINUS for 'simple complement'. Is that what you mean?

In that case, I don't see what you mean by preciseness. Darwen's AND, OR and NOT are more general operations than those of the relational algebra, but they are not any more precise. And the price he pays for allowing them is worse than computational: the tuples of the relations we create with them can no longer all be regarded as expressing positive facts. We now have to know how a resulting relation was produced before we can interpret it. We *lose* preciseness.

>> What are you trying to achieve?
>> ...
>
>Originally, the most potent axioms for an implementation language that
>follows McGoveran's suggestions, ie., ones that recognize relation
>structure as opposed to how a relation value may have been arbitrarily
>formed.

*Implementation* language?

Relational algebra (with possible extensions such as transitive closure) already gives us both mathematical simplicity (the axiom aspect) and implementation orientedness (it is targeted at optimization). Darwen's algebra isn't any more axiomatic, it is *less* precise, as we no longer know how to interpret the relations we produce. and *not* implementation oriented, because NOT and OR cannot actually be implemented as they are (try NOT-ing the table {A: 1, B: 1, C: 1}, where A, B, C can take arbitrary floating point values). I would venture that the first thing any implementer will attempt to do is rewrite all of Darwen's expressions back to relational algebra as far as possible, and throw errors on the remainder. Then again, there is no doubt a lot of theory on working with partly disjunctively or negatively interpreted relations that I'm not aware of.

The same objection holds for treating selection as join with mathematical relations (in Darwen's section 'Treating operators as relations'.) A relation in a relational database is of a very different nature than a mathematical relation such as PLUS. A database relation consists of a posteriori knowledge, information about the world. A mathematical relation is a priori: it reflects information about mathematical constructs, not information about the world.

A database relation is established by someone or something who observes facts in the world and makes sure the tuples of the relation reflect them. The reason databases exist is that we want to store and then retrieve such information. We typically want to be able to write a query that produces, say, the relation reflecting the municipalities of the country with their mayors. Typically, we want such a query to always produce the latest known version of that information without having to specifically rewrite it to make it do so; in which case the relation is 'time-varying', i.e. its contents will need to be updated from time to time.

None of this is true for mathematical relations. They are invariant. Most are infinite. It is indeed mathematically possible to express arbitrary selection expressions in terms of relational joins, but this only makes them hard to read and write, it obscures implementation aspects such as performance, and I can't think of a benefit. Nice as an academic exercise (look, ma, I removed a concept) but not as an academic result (no useful purpose).

>The way most people talk of Codd's algebra is simplistic
>because they have typically not tried to interpret it, probably have
>never even read it, for example, in the above, what do horizontal and
>vertical coordinates have to do with anything relational?.

You know very well I wasn't talking about coordinates, it was just a shorthand that is completely *obvious* (considering that we usually visualize relations as tables) and that I *carefully explained* just to be sure you would get it. If you still didn't, I'm really sorry but I don't think it's my fault.

>This is faux
>technocracy, usually involving what de Bono called porridge words which
>in this case are additional lingo to disguise a lack of basic understand

Covering up a lack of reading comprehension with insults, as you have a habit of doing, doesn't exactly help your credibility, mister. Do you want help on understanding relational theory, or don't you?

>(and which is how I believe Codd's algebra is usually 'taught', or
>thought to be 'taught'. But more to the point, I continue to be amazed
>that people claim they can 'insert' a tuple to a relvar but cannot
>'delete' that very tuple! (vice-versa too.)

I can make sense of that remark in various ways so I'm not sure how to respond. In relational algebra this is well possible, because of the possible creation of symmetries that selections cannot 'break'. Please clarify.

>I think when they say they
>can't delete, what they are really saying is that they can't delete a
>tuple that
> is not in the join! The same goes for asserting and retracting a
>proposition in general.
> I've no doubt that it might be difficult to
>retract a proposition that one has never asserted! In a way, this is no
>surprise to me and I still don't know why people think it is a problem,

I'm not sure what you mean. Maybe it depends on what kinds of facts a database is assumed to contain; assuming that every tuple corresponds to a fact and absence of a tuple to its negation really simplifies things, but I don't know if you do. (Does make any sense as a reply?)

>when all human affairs contain much mumbo-jumbo that is logically
>make-believe. Now that you mention it, another goal would be to prevent
>geometric interpretations of the basic algebraic structures! Sounds
>like tables or arrays are getting mixed up with relations.

Yes, to you it does. You see a word, it triggers an association in your mind, and you start blathering about the association instead of focusing on trying to understand what is actually meant. You're in bad company here, and I certainly won't exclude myself, but if you want to make progress on this algebra stuff, a change of attitude is advisable.

---
Reinier Received on Thu Jan 08 2009 - 13:21:59 CST

Original text of this message