Re: U_SUMMARIZE: how aggregate is calculated?

From: <mikharakiri_nospaum_at_yahoo.com>
Date: 1 Dec 2004 22:45:13 -0800
Message-ID: <1101969913.402540.176690_at_c13g2000cwb.googlegroups.com>


Jonathan Leffler wrote:
> Mikito Harakiri wrote:
> > Suppose we have one column relation which is essentially a set of
intervals:
> >
> > {[1,2], [3,5], [4,7]}
> >
> > and I want to project away the interval column and calculate
aggregated
> > value of SUM(1). What the result there would be? 3? 2? 7?
>
> Now, your subject line mentions U_SUMMARIZE, but it is not clear that

> your question involves it.

Well, isn't SUMMARIZE is the only way to do aggregation in Date et al relational syntax?

> Suppose your relation is R and the interval attribute (of type
> INTERVAL_INTEGER, I'm assuming) is A - not that it matters much.
>
> You want to project away the interval column -- OK, that's R { }, and

> the result is a table with one row containing an empty tuple, aka
> TABLE_DEE. The aggregated value of SUM(1) is 1 because there's one
row.
>
> ...oh?...

Nope. "Aggregate-groupby" is an [generalized] projection, right? For example,

select empno, sum(1+sal) from emp
group by empno

is a [generalized] projection to empno. All the other columns of the original relation are projected away. After we removed extra column and merged duplicates, ordinary projection would have just one column. In the extended projection we just do some extra calculation during duplicates merge and calculate aggregate expressions.

> I don't think that's the question you intended to ask. So, let's
> rewrite the question to use a U_SUMMARIZE operation.
>
> USING ( A) <| SUMMARIZE R PER ?R2? ADD SUM(1) AS B |>
>
> What are we to use as 'PER ?R2?'? Let's assume that we use TABLE_DEE

> again (also achieved by omitting the PER clause altogether, I
> believe), and therefore ACL' is empty.
>
> UNPACK TABLE_DEE ON () is just TABLE_DEE.
>
> So, the U_SUMMARIZE is a shorthand for...
>
> PACK(SUMMARIZE(UNPACK R ON A) PER TABLE_DEE ADD SUM(1) AS B) ON ()
>
> Any PACK R ON () operation also yields R, hence we have:
>
> SUMMARIZE ( UNPACK R ON A ) PER TABLE_DEE ADD SUM(1) AS B
>
> Given the sample data, UNPACK R ON A yields (using your notation with

> commas in the middle of an interval rather than colons as in TDRM):
>
> RELATION { TUPLE { A [1,1] }, TUPLE { A [2,2] }, TUPLE { A [3,3] },
> TUPLE { A [4,4] }, TUPLE { A [5,5] }, TUPLE { A [6,6] },
> TUPLE { A [7,7] } }
>
> Summarizing that PER TABLE_DEE ADD SUM(1) AS B yields, I think:
>
> RELATION
> { TUPLE { A [1,1], B 1 },
> TUPLE { A [2,2], B 1 },
> TUPLE { A [3,3], B 1 },
> TUPLE { A [4,4], B 1 },
> TUPLE { A [5,5], B 1 },
> TUPLE { A [6,6], B 1 },
> TUPLE { A [7,7], B 1 }
> }
>
> I suspect that this means either (1) I've boobed (please be gentle)
or
> (2) I've not used the right U_SUMMARIZE expression. Maybe instead of

> PER TABLE_DEE, I should be using PER (UNPACK R ON A).

Why your output relation has 2 columns? We project away the interval column A, right? Therefore, if we were calculating ordinary projection then we would get a relation with no columns at all. The generalized projection would include extra aggregate expression column. Therefore the result should be a single column relation.

In order to avoid further misinterpretation, let me write the query in SQL select sum(1) from R

Wait a minute, isn't SQL supposed to be verbose compared to Tutorial D? Received on Thu Dec 02 2004 - 07:45:13 CET

Original text of this message