Re: choice of character for relational division

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Mon, 02 Apr 2007 20:13:19 GMT
Message-ID: <ztdQh.18596$PV3.192408_at_ursa-nb00s0.nbnet.nb.ca>


Marshall wrote:

> On Apr 2, 11:31 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>

>>Marshall wrote:
>>
>>>On Apr 1, 10:04 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>>
>>Are you saying your language will use set algebra plus quantification
>>for everything?

>
> No. I think of three separate aspects to the language.
>
> 1) The type system (entirely compile time)
> 2) The computation system (entirely runtime)
> 3) The constraint system (mix of runtime, compile time)
>
> Most languages have only 1 and 2. (Some have only 2!)
>
> In my language, 1 is decidedly on the lame side. (Because
> many of its features have moved into 3.) It is essentially
> just the minimum necessary to ensure that 2 behaves
> in a well-defined, relational way.
>
> 2) is purely algebraic.
>
> 3) uses the same language as 2 but with the addition of
> quantification.
>
> I could perhaps be talked out of adding quantification to 3.
> It just seems to make constraint writing easier, but now
> that the question comes up, I really haven't done much
> to justify that statement.
>
> My ambitions for the constraint system are extremely
> high, probably unrealistically so.

If quantification is useful for the constraint system, why don't you consider it equally useful for the computation system? Have you gotten ahold of Codd's 1972 paper?

Will the type system support generic types or what Date et al call "type generators" ? Will it have a user-extensible generic type facility?

>>>>So, a join is the inverse of a divide, but we don't necessarily have an
>>>>inverse of a join. For that, we would have to specify the set of join
>>>>attributes, and the result would necessarily omit any unmatched tuples
>>>>that were in the original relations. Hmmmm... off the top of my head I
>>>>wonder if such a thing would even have a unique solution?
>>
>>>The "inverse" here is the inverse the way div is the inverse of
>>>int multiply. There are remainders. And yes, the solution is not
>>>unique in general because of the specification of attributes
>>>issue. However there is a unique minimal solution: the one that
>>>uses the fewest attributes.
>>
>>>>>>ItsEarlyAndICan
>>>>>>A    B    C
>>>>>>1    1    b
>>>>>>1    2    a
>>>>>>1    3    c
>>>>>>2    11   d
>>
>>>>>>What does "ItsEarlyAndICan $ (Total = sum(B))" evaluate to?
>>
>>>>>ItsEarlyAndICan $ (Total = sum(B))
>>>>>A C Total
>>>>>1 b 1
>>>>>1 a 2
>>>>>1 c 3
>>>>>2 d 11
>>
>>>>>In an aggregation expression A over a relation R, let us partition the
>>>>>attributes of R into three sets:
>>>>>1) Those that are used as attributes to group by
>>>>>2) Those that are used as parameters to aggregate functions
>>>>>3) Those that are discarded
>>
>>>>>In SQL GROUP BY, 1 and 2 are explicit in the expression, and 3 is
>>>>>defined by its omission. This is convenient.
>>
>>>>>In relational division, 2 and 3 are explicit in the expression, and
>>>>>1 is defined by its omission. This is inconvenient, but necessary
>>>>>to be consistent with being the inverse of join. :-(
>>
>>>>How would one have to alter the expression above to discard C?
>>
>>>Any of the three ways I list immediately below:
>>
>>>1) Multiply the dividend by the projection of the divisor over
>>>the attribute you want to discard (multiply here is cartesian
>>>product)
>>>2) Multiply the dividend by the domain of the attribute you
>>>want to discard
>>
>>I don't get it. How does multiplying {A, B, C} by {C} get rid of C?

>
> Crap. I meant divisor. It's noon and I haven't had breakfast yet.
>
> Rewritten:
>
> 1) Multiply the divisor by the projection of the dividend over
> the attribute you want to discard (multiply here is cartesian
> product)
> 2) Multiply the divisor by the domain of the attribute you
> want to discard

Okay, now it makes sense to me. Sort of. If I multiply the divisor by the entire domain, doesn't the divide then request all of the A's associate with the entire domain of C? ie. "All of the suppliers who supply every imaginable part past, present and future" ?

I don't see how the aggregate divide relates to the regular divide at all.

>>ie. (ItsEarlyAndICan & {C}) $ (Total = sum(b))
>>
>>Did you meant to say "multiply the divisor by the projection of the
>>dividend over the attribute you want to discard" ?

>
> Yeah.
>
>
>
>>
>>>3) Add a "Highly Error Correlated" parameter to the aggregate
>>>function. (That is, a parameter that isn't used in calculating
>>>the aggregate.)
>>
>>>This is more an issue for the internal language than the
>>>external one; I don't expect hardly anyone to be interested
>>>in these issues for practical reasons; I'd expect people to
>>>use the GROUP BY construct that behaves in a more
>>>convenient and intuitive way.
>>
>>>In fact, I'm not even altogether sure that it's worth putting
>>>this into the internal language. But it's interesting to
>>>do the math.
>>
>>>>>So I would expect to add to the external language a construct
>>>>>similar to SQL's GROUP BY that was defined in terms of
>>>>>relational division. (Yet unlike in SQL was algebraic.)
>>
>>>>>Amusingly, we convert the division into a GROUP BY by
>>>>>multiplying the dividend. Adding attributes to the dividend
>>>>>removes them from the result. We can multiply by either
>>>>>the projection of the divisor over the attributes we wish
>>>>>to remove from the result, or by the entire domain of
>>>>>the attributes we wish to remove.
>>
>>>>>Or, in a third possibility, we can add those attributes as
>>>>>parameters to the aggregate function. So if we wanted
>>>>>to take ItsEarly and compute "sum(C) group by A" we would
>>>>>use an aggregate function sum', which took B as a parameter
>>>>>but didn't inspect that parameter.
>>
>>>>>def sum'(B, C) = sum(B)
>>
>>>>>ItsEarly $ (Total = sum'(B, C))
>>>>>A Total
>>>>>1 6
>>>>>2 11
>>
>>>>That seems quite yucky from an aesthetic point of view.
>>
>>>Which one is yucky? The third one? All of them?
>>
>>The third one. The others simply did not make sense to me. I understood
>>the third one which I found kludgy or hackish -- in short yucky.

>
> Okay.

Now that I understand the first two, I find them kinda yucky too. Would you specify 'project' similarly?

>>>>>Lately I am running in to that construct: an ignored function
>>>>>parameter.
>>
>>>>>f:(a, b) | forall b: forall b': f(a, b) = f(a, b')
>>
>>>>>Does this have a name?
>>
>>>>Um, in programming I would call it "highly error-correlated". HEC for
>>>>short, maybe?
>>
>>>Heh.
>>
>>Seriously. From what I recall, unused parameters and unused local
>>variables correlate very highly with bugs.

>
> No, I completely buy it. But remember I'm talking language
> semantics here, not techniques for writing code. The division
> part I don't expect will get direct use, however it is useful to
> understand
> its mathematical properties.

It sounds like 'project' might be some special case of 'divide' assuming the aggregate divide really does have some connection to the regular non-aggregate divide. Received on Mon Apr 02 2007 - 22:13:19 CEST

Original text of this message