Re: Lucid statement of the MV vs RM position?

From: Jon Heggland <jon.heggland_at_idi.ntnu.no>
Date: Mon, 08 May 2006 23:10:55 +0200
Message-ID: <e3oc8l$qbr$1_at_orkan.itea.ntnu.no>


Marshall Spight wrote:
> Jon Heggland wrote:

>> I focus on the significant difference between
>>
>> 1. An aggregate operator
>> 2. The invocation of an aggregate operator within a SUMMARIZE operator
>>
>> You don't get from relation (1) to relation (2) in my post to Marshall
>> just by using SUM. You have to use SUMMARIZE as well.

>
> I agree, however I consider this point so obvious as to be not
> worth mentioning. No one bothers to distinguish carefully
> between, say, a function literal and an application of that
> function in speech. It's just quite clear from context.

I'm not quite sure I understand your argument here; please ignore if I'm setting up a straw man here. I think the distinction between SUMMARIZE ... ADD (SUM(...) AS ...) and SUM(...) is crucial for at least two reasons:

  1. A SUMMARIZE can apply more than one aggregate operator. If you use the name SUM for a SUMMARIZE that applies iterated addition on one attribute, and the name GROUP for a SUMMARIZE that applies iterated union on one attribute, what do you call a SUMMARIZE that applies both addition and union in one go? Or would you disallow such an operation, forcing it to be implemented as two separate aggregate operator invocations, followed by a join? Much pain for no gain, if you ask me.
  2. If you define aggregate operators this way, you have to repeat the definition of the "grouping procedure" for each one. Using SUMMARIZE, it is factored out, and all you need to add a new aggregate operator is to specify the operator and the identity.

Furthermore, Tutorial D's aggregate operators can also be used as expressions directly, i.e. not only within the context of a SUMMARIZE. For example, SUM(R,A) is an expression evaluating to the sum of the A attribute in relation R. If SUMMARIZE ... ADD (SUM(...) AS ...) is called "the aggregate operator SUM", what should we call SUM(R,A)? They are obviously not the same.

> There are quite a number of different programming models
> or functional programming languages that support "fold"
> or "reduce". (D&D here again making up their own terminology
> when perfectly good decades-old terms exist, confusing the
> discussion.)

I'm not familiar with those terms, so I cannot validate your claim that they would be perfectly good in this case. :) In any case, I thought nest/unnest was the perfectly good terms that D&D discard. ;)

(I for one have sympathy for "making up" terminology when the existing is unsatisfactory. For instance, I think "irreducible key" is much better than "minimal key", for reasons I deem obvious, even though "minimal" may be decades-old.)

> Only SQL and TutD, that I'm aware of, bother
> to have a separate abstraction called an "aggregate
> function."

What do other relational or quasi-relation systems do, then? What kind of terminology/syntax would you prefer?

-- 
Jon
Received on Mon May 08 2006 - 23:10:55 CEST

Original text of this message