Re: Lucid statement of the MV vs RM position?

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 9 May 2006 11:06:21 -0700
Message-ID: <1147197981.327539.290100_at_u72g2000cwu.googlegroups.com>


Jon Heggland wrote:
> Marshall Spight wrote:
> >>
> >> You don't get from relation (1) to relation (2) in my post to Marshall
> >> just by using SUM. You have to use SUMMARIZE as well.
> >
> > I agree, however I consider this point so obvious as to be not
> > worth mentioning. No one bothers to distinguish carefully
> > between, say, a function literal and an application of that
> > function in speech. It's just quite clear from context.
>
> I'm not quite sure I understand your argument here; please ignore if I'm
> setting up a straw man here. I think the distinction between SUMMARIZE
> ... ADD (SUM(...) AS ...) and SUM(...) is crucial for at least two reasons:

Yes, it's crucial, but it's also ... I dunno. "Obvious" I guess.

> 1. A SUMMARIZE can apply more than one aggregate operator.

Yeah, this is a good point. This is why I actually prefer the RM view of aggregates over the functional programming ("FP") one, even though the FP one is (I believe) more general. Also the FP world tends to focus on lists over sets, which I don't think is where the money is, so to speak.

Consider: if I said "The + operator adds two numbers" and you said "No, you need an APPLICATION of the + operator to do that"-- it's sort of overdoing the specificity, don't you think? Even though it is "crucial" to distinguish between an operator and an application of that operator.

> 2. If you define aggregate operators this way, you have to repeat the
> definition of the "grouping procedure" for each one. Using SUMMARIZE, it
> is factored out, and all you need to add a new aggregate operator is to
> specify the operator and the identity.

Sure. It's a good thing.

> Furthermore, Tutorial D's aggregate operators can also be used as
> expressions directly, i.e. not only within the context of a SUMMARIZE.
> For example, SUM(R,A) is an expression evaluating to the sum of the A
> attribute in relation R. If SUMMARIZE ... ADD (SUM(...) AS ...) is
> called "the aggregate operator SUM", what should we call SUM(R,A)? They
> are obviously not the same.

This is just syntax, which is ultimately not very interesting.

> I'm not familiar with those terms, so I cannot validate your claim that
> they would be perfectly good in this case. :) In any case, I thought
> nest/unnest was the perfectly good terms that D&D discard. ;)
>
> (I for one have sympathy for "making up" terminology when the existing
> is unsatisfactory. For instance, I think "irreducible key" is much
> better than "minimal key", for reasons I deem obvious, even though
> "minimal" may be decades-old.)

Yeah, I'm not staunchly insistent on using established terms. For example, it is standard usage today to call a named value a "variable" whether it supports update/assignment or not, which I don't think makes much sense. But new terms where there are existing terms that are as good or almost as good is just terrible, and "summarize" falls into this latter category, IMHO.
> > Only SQL and TutD, that I'm aware of, bother
> > to have a separate abstraction called an "aggregate
> > function."
>
> What do other relational or quasi-relation systems do, then? What kind
> of terminology/syntax would you prefer?

The FP world often has a "fold" function as part of the standard library. You pass it a collection, an operator, and an identity value. I actually don't prefer this, though, because of the value of things you mentioned earier, such as the "group by" aspect, and the multiple-aggregates-at-once-easily aspect.

Marshall Received on Tue May 09 2006 - 20:06:21 CEST

Original text of this message