Re: completeness of the relational lattice
Date: Sun, 24 Jun 2007 23:32:48 -0000
Message-ID: <1182727968.713590.257140_at_q69g2000hsb.googlegroups.com>
On 24 jun, 00:11, Vadim Tropashko <vadimtro_inva..._at_yahoo.com> wrote:
> On Jun 23, 6:53 am, Jan Hidders <hidd..._at_gmail.com> wrote:
>
>
>
> > Btw. let me give a rough sketch of what my plan of attack is going to
> > be. It takes very big steps and ignores important technical issues,
> > but it should give you an idea of what my ideas are. The proof is
> > going to proceed in several stages with several normal forms. The
> > syntax I'll start from is:
>
> > E ::= R | E /\ E | E \/ E | 11 | [H] | 'x=y'
>
> > I allow H to be empty, so I omitted 00 = [], 01 = 11 \/ [] and 10 =
> > 11 /\ []
>
> > The first stage will be a normalization to a "union normal form" that
> > I already talked about. This normal form means that the expression is
> > in a form r_1 \/ ... \/ r_n with the r_i such that the contain no \/
> > except in the form s \/ [H]. I will refer to these r_i's as the
> > disjuncts.
>
> Let see if I can follow this. Say we have an expression from my other
> post:
>
> (`x=y`) \/ R \/ ( ((R /\ `y=z`) \/ [x,z]) /\ ((R /\ `x=z`) \/ [z,y]) )
>
> It is in union normal form, right? And
>
> ( (`x=z`) \/ ( (R /\ `y=z`) \/ [x,z] ) /\ ( (`z=y`) \/ ( (R /\ `x=z`)
> \/ [z,y]) )
>
> is not?
Correct.
> > Conjecture: we have enough rewrite rules to achieve this normal form.
>
> While the above example can be worked out to demonstrate that both
> expressions are equivalent, it fairly easy to cook up the example
> which can't be reduced to union normal form:
>
> Let R(u,w), S(u), Q(w), T(V), P(u,v), then we can't apply
> distributivity to simplify
>
> ( (R /\ (S \/ Q)) \/ T ) /\ P
This is solved by the extra rules for [H]. These allow me to make the attributes of an expression explicit by rewriting it to r \/ [H] where H contains all attributes of r. That means that in the above you can rewrite S \/ Q to (S \/ [H1]) \/ (Q \/ [H2]) which in turn can be rewritten to (S \/ [H1*H2]) \/ (Q \/ [H1*H2]). By then distribtivity will apply.
> > Some of the disjuncts will be conjunctive queries (CQ) and some will
> > not. There are three reasons why some are not in CQ: (1) the result
> > may be always empty, (2) the result is (infinitely) wide and (3) the
> > result is (infinitely) long.
>
> I don't follow this, the example maybe helpful. I can't resist a
> feeling that those appears to be merely some special cases.
True, but for the moment I had to leave out the infinitely wide relations, because I didn't see immediately how to deal with them. I might try again later.
> > Conjecture: if a disjunct always returns an empty result it can be
> > rewritten to []
> In the example above
>
> ( (R /\ (S \/ Q)) \/ T ) /\ P
>
> suppose the join of P with the rest of the expression is empty. How do
> we collapse it to []?
With the extra rules for [] that I haven't told you about yet. :-)
Maybe at this point I should simply show you the rules that I think are necessary. I'm not completely sure they are really complete, because I haven't finished the proof completely yet, but I suspect it is close.
While working on the proof I changed the fragment a little, and removed the infinitely wide relations because they were hard to deal with. I also reverted to my old notation for the natural join and inner join. My apologies for that.
We consider the following algebra:
E ::= R | E * E | E + E | {()} | [H] | 'x=y'
Where:
The * and + will also be used as intersection and union on sets of
attributes.
We define a function A that computes the header of the result:
Standard equalities:
Special distribution equalities:
(8) r * (s + t) = (r * s) + (r * t) if A(r) * A(s) <= A(t) or
A(r) * A(t) <= A(s)
- R : a relation with name R (and known header)
- E * E : natural join
- E + E : generalized union
- {()} : the relation with empty header and a single empty tuple
- [H] : the empty relation with header H (possibly empty)
- 'x=y' : the relation { (x=a, y=b) | a in D, b in D, a=b }
- A(R) = H where H is the header of R
- A(r * s) = A(r) + A(s)
- A(r + s) = A(r) * A(s)
- A({()}) = {}
- A([H]) = H
- A('x=y') = {x,y}
(1) r * r = r
(2) r * s = s * r
(3) r * (s * t) = (r * s) * t
(4) r + r = r
(5) r + s = s + r
(6) r + (s + t) = (r + s) + t
(7) r * {()} = r
(13) {()} = {()} + []
(9) r + (s * t) = (r + s) * (r + t) if A(s) * A(t) <= A(r)
Absorption:
(20) r + (r * s) = r
(21) r * (r + s) = r
Empty relations:
(14) 'x=y' = 'x=y' + [x,y] (10) R = R + [H] if H is the header of R (11) [H] * [S] = [H + S] (12) [H] + [S] = [H * S] (22) R * [] = [H] if H is the header of R
Equalities
(15) 'x=y' = 'x=y' * 'x=x' (16) 'x=y' = 'x=y' * 'y=x' (17) 'x=y' * 'y=z' = 'x=y' * 'y=z' * 'x=z' (18a) R * 'x=x' = R if x in header R
Miscellaneous
(18b) [] * 'x=y' = [x,y] (18c) 'x=y' + [H] = 'x=x' if x in H and y not in H (19) 'x=y' + [] = {()} (24) ((r * 'x=y') + [H]) * 'x=y' = ((r * 'x=y') + [H + {y}]) if xin H
That's it. Apologies for the weird numbering, it reflects the order in which I needed them in the proof.
- Jan Hidders