Re: choice of character for relational division

From: Marshall <marshall.spight_at_gmail.com>
Date: 2 Apr 2007 10:05:15 -0700
Message-ID: <1175533515.491199.294900_at_y80g2000hsf.googlegroups.com>


On Apr 1, 10:04 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> Marshall wrote:
>
> >>But we do ask whether one set is equal to another set. Are you
> >>constructing a calculus or an algebra?
>
> > An algebra. (However the constraint language is a calculus.
> > I am unclear if this is an issue. Like JOG's niggle over
> > directed vs. undirected edges, I am unsure whether the thing
> > to do is just "get over it.")
>
> I would tend to use a single language for both derivation and integrity
> constraints. Why force anyone to learn two equivalent languages?

A fair question.

First let me say that despite some serious searching, I've been unable to come up with a firm definition of either "calculus" or "algebra."

The basic language is algebraic. Things are wonderfully expressive this way, and nicely composable.

Not sure I'll get his right, but let's look at an expression "find all the parts that are supplied by suppliers in Los Angeles."

Algebra:

  SELECT PartNo from Parts natural join Suppliers where City = "Los Angeles"

or in my notation:

  Suppliers & city = "Los Angeles" & SupplierParts | PartNo

Calculus:

  forall (SupplierNo, City) in Suppliers:     forall (SupplierNo', PartNo) in SupplierParts:       SupplierNo = SupplierNo'

(How do you project in the calculus?)

Set Builder:

  {(PartNo) | (SupplierNo, City) ∈ Suppliers, (SupplierNo, PartNo) ∈ SupplierParts }

The algebra seems the clearest to me, although set builder is pretty good too.

The calculus for the constraint language is really just the algebra with forall + exits, ranging over relation attributes, added in. It seems the easiest way to write constraints.

Anyway, I dunno for sure, but I'll try it and if it doesn't work I'll try something else. (That's sorta my motto lately.)

> So, a join is the inverse of a divide, but we don't necessarily have an
> inverse of a join. For that, we would have to specify the set of join
> attributes, and the result would necessarily omit any unmatched tuples
> that were in the original relations. Hmmmm... off the top of my head I
> wonder if such a thing would even have a unique solution?

The "inverse" here is the inverse the way div is the inverse of int multiply. There are remainders. And yes, the solution is not unique in general because of the specification of attributes issue. However there is a unique minimal solution: the one that uses the fewest attributes.

> >>ItsEarlyAndICan
> >>A B C
> >>1 1 b
> >>1 2 a
> >>1 3 c
> >>2 11 d
>
> >>What does "ItsEarlyAndICan $ (Total = sum(B))" evaluate to?
>
> > ItsEarlyAndICan $ (Total = sum(B))
> > A C Total
> > 1 b 1
> > 1 a 2
> > 1 c 3
> > 2 d 11
>
> > In an aggregation expression A over a relation R, let us partition the
> > attributes of R into three sets:
> > 1) Those that are used as attributes to group by
> > 2) Those that are used as parameters to aggregate functions
> > 3) Those that are discarded
>
> > In SQL GROUP BY, 1 and 2 are explicit in the expression, and 3 is
> > defined by its omission. This is convenient.
>
> > In relational division, 2 and 3 are explicit in the expression, and
> > 1 is defined by its omission. This is inconvenient, but necessary
> > to be consistent with being the inverse of join. :-(
>
> How would one have to alter the expression above to discard C?

Any of the three ways I list immediately below:

  1. Multiply the dividend by the projection of the divisor over the attribute you want to discard (multiply here is cartesian product)
  2. Multiply the dividend by the domain of the attribute you want to discard
  3. Add a "Highly Error Correlated" parameter to the aggregate function. (That is, a parameter that isn't used in calculating the aggregate.)

This is more an issue for the internal language than the external one; I don't expect hardly anyone to be interested in these issues for practical reasons; I'd expect people to use the GROUP BY construct that behaves in a more convenient and intuitive way.

In fact, I'm not even altogether sure that it's worth putting this into the internal language. But it's interesting to do the math.

> > So I would expect to add to the external language a construct
> > similar to SQL's GROUP BY that was defined in terms of
> > relational division. (Yet unlike in SQL was algebraic.)
>
> > Amusingly, we convert the division into a GROUP BY by
> > multiplying the dividend. Adding attributes to the dividend
> > removes them from the result. We can multiply by either
> > the projection of the divisor over the attributes we wish
> > to remove from the result, or by the entire domain of
> > the attributes we wish to remove.
>
> > Or, in a third possibility, we can add those attributes as
> > parameters to the aggregate function. So if we wanted
> > to take ItsEarly and compute "sum(C) group by A" we would
> > use an aggregate function sum', which took B as a parameter
> > but didn't inspect that parameter.
>
> > def sum'(B, C) = sum(B)
>
> > ItsEarly $ (Total = sum'(B, C))
> > A Total
> > 1 6
> > 2 11
>
> That seems quite yucky from an aesthetic point of view.

Which one is yucky? The third one? All of them?

> > Lately I am running in to that construct: an ignored function
> > parameter.
>
> > f:(a, b) | forall b: forall b': f(a, b) = f(a, b')
>
> > Does this have a name?
>
> Um, in programming I would call it "highly error-correlated". HEC for
> short, maybe?

Heh.

Marshall Received on Mon Apr 02 2007 - 19:05:15 CEST

Original text of this message