Re: choice of character for relational division

From: Marshall <marshall.spight_at_gmail.com>
Date: 2 Apr 2007 12:00:43 -0700
Message-ID: <1175540443.603337.243900_at_d57g2000hsg.googlegroups.com>


On Apr 2, 11:31 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> Marshall wrote:
> > On Apr 1, 10:04 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>
>
> Are you saying your language will use set algebra plus quantification
> for everything?

No. I think of three separate aspects to the language.

  1. The type system (entirely compile time)
  2. The computation system (entirely runtime)
  3. The constraint system (mix of runtime, compile time)

Most languages have only 1 and 2. (Some have only 2!)

In my language, 1 is decidedly on the lame side. (Because many of its features have moved into 3.) It is essentially just the minimum necessary to ensure that 2 behaves in a well-defined, relational way.

2) is purely algebraic.

3) uses the same language as 2 but with the addition of quantification.

I could perhaps be talked out of adding quantification to 3. It just seems to make constraint writing easier, but now that the question comes up, I really haven't done much to justify that statement.

My ambitions for the constraint system are extremely high, probably unrealistically so.

> >>So, a join is the inverse of a divide, but we don't necessarily have an
> >>inverse of a join. For that, we would have to specify the set of join
> >>attributes, and the result would necessarily omit any unmatched tuples
> >>that were in the original relations. Hmmmm... off the top of my head I
> >>wonder if such a thing would even have a unique solution?
>
> > The "inverse" here is the inverse the way div is the inverse of
> > int multiply. There are remainders. And yes, the solution is not
> > unique in general because of the specification of attributes
> > issue. However there is a unique minimal solution: the one that
> > uses the fewest attributes.
>
> >>>>ItsEarlyAndICan
> >>>>A B C
> >>>>1 1 b
> >>>>1 2 a
> >>>>1 3 c
> >>>>2 11 d
>
> >>>>What does "ItsEarlyAndICan $ (Total = sum(B))" evaluate to?
>
> >>>ItsEarlyAndICan $ (Total = sum(B))
> >>>A C Total
> >>>1 b 1
> >>>1 a 2
> >>>1 c 3
> >>>2 d 11
>
> >>>In an aggregation expression A over a relation R, let us partition the
> >>>attributes of R into three sets:
> >>>1) Those that are used as attributes to group by
> >>>2) Those that are used as parameters to aggregate functions
> >>>3) Those that are discarded
>
> >>>In SQL GROUP BY, 1 and 2 are explicit in the expression, and 3 is
> >>>defined by its omission. This is convenient.
>
> >>>In relational division, 2 and 3 are explicit in the expression, and
> >>>1 is defined by its omission. This is inconvenient, but necessary
> >>>to be consistent with being the inverse of join. :-(
>
> >>How would one have to alter the expression above to discard C?
>
> > Any of the three ways I list immediately below:
>
> > 1) Multiply the dividend by the projection of the divisor over
> > the attribute you want to discard (multiply here is cartesian
> > product)
> > 2) Multiply the dividend by the domain of the attribute you
> > want to discard
>
> I don't get it. How does multiplying {A, B, C} by {C} get rid of C?

Crap. I meant divisor. It's noon and I haven't had breakfast yet.

Rewritten:

  1. Multiply the divisor by the projection of the dividend over the attribute you want to discard (multiply here is cartesian product)
  2. Multiply the divisor by the domain of the attribute you want to discard

> ie. (ItsEarlyAndICan & {C}) $ (Total = sum(b))
>
> Did you meant to say "multiply the divisor by the projection of the
> dividend over the attribute you want to discard" ?

Yeah.

>
>
> > 3) Add a "Highly Error Correlated" parameter to the aggregate
> > function. (That is, a parameter that isn't used in calculating
> > the aggregate.)
>
> > This is more an issue for the internal language than the
> > external one; I don't expect hardly anyone to be interested
> > in these issues for practical reasons; I'd expect people to
> > use the GROUP BY construct that behaves in a more
> > convenient and intuitive way.
>
> > In fact, I'm not even altogether sure that it's worth putting
> > this into the internal language. But it's interesting to
> > do the math.
>
> >>>So I would expect to add to the external language a construct
> >>>similar to SQL's GROUP BY that was defined in terms of
> >>>relational division. (Yet unlike in SQL was algebraic.)
>
> >>>Amusingly, we convert the division into a GROUP BY by
> >>>multiplying the dividend. Adding attributes to the dividend
> >>>removes them from the result. We can multiply by either
> >>>the projection of the divisor over the attributes we wish
> >>>to remove from the result, or by the entire domain of
> >>>the attributes we wish to remove.
>
> >>>Or, in a third possibility, we can add those attributes as
> >>>parameters to the aggregate function. So if we wanted
> >>>to take ItsEarly and compute "sum(C) group by A" we would
> >>>use an aggregate function sum', which took B as a parameter
> >>>but didn't inspect that parameter.
>
> >>>def sum'(B, C) = sum(B)
>
> >>>ItsEarly $ (Total = sum'(B, C))
> >>>A Total
> >>>1 6
> >>>2 11
>
> >>That seems quite yucky from an aesthetic point of view.
>
> > Which one is yucky? The third one? All of them?
>
> The third one. The others simply did not make sense to me. I understood
> the third one which I found kludgy or hackish -- in short yucky.

Okay.

> >>>Lately I am running in to that construct: an ignored function
> >>>parameter.
>
> >>>f:(a, b) | forall b: forall b': f(a, b) = f(a, b')
>
> >>>Does this have a name?
>
> >>Um, in programming I would call it "highly error-correlated". HEC for
> >>short, maybe?
>
> > Heh.
>
> Seriously. From what I recall, unused parameters and unused local
> variables correlate very highly with bugs.

No, I completely buy it. But remember I'm talking language semantics here, not techniques for writing code. The division part I don't expect will get direct use, however it is useful to understand
its mathematical properties.

Marshall Received on Mon Apr 02 2007 - 21:00:43 CEST

Original text of this message