Re: U_SUMMARIZE: how aggregate is calculated?

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Thu, 02 Dec 2004 05:27:39 GMT
Message-ID: <fJxrd.5837$u81.1139_at_newsread3.news.pas.earthlink.net>


Mikito Harakiri wrote:
> Suppose we have one column relation which is essentially a set of intervals:
>
> {[1,2], [3,5], [4,7]}
>
> and I want to project away the interval column and calculate aggregated
> value of SUM(1). What the result there would be? 3? 2? 7?

I've just been re-reading TDRM ("Temporal Data and the Relational Model" by Date, Darwen and Lorentzos) which is one place where U_SUMMARIZE is defined - and I'm not aware of anywhere else that defines it.

In chapter 9, it defines U_SUMMARIZE as an expression like:

USING ( ACL ) <| SUMMARIZE R1 PER R2 ADD summary AS B |>

(The <| and |> are solid arrowheads in the book).

This is a shorthand for:

PACK
    ( SUMMARIZE ( UNPACK R1 ON ( ACL ) )

      PER ( UNPACK R2 ON ( ACL' ) )
      ADD summary AS B ) )

ON ( ACL' ) It annotates this with: Every attribute mentioned in ACL must be of some interval type and must be an attribute of R1; ACL' is the same as ACL, except that any attribute in ACL that does not appear in R2 is simply ignored.

Now, your subject line mentions U_SUMMARIZE, but it is not clear that your question involves it.

Suppose your relation is R and the interval attribute (of type INTERVAL_INTEGER, I'm assuming) is A - not that it matters much.

You want to project away the interval column -- OK, that's R { }, and the result is a table with one row containing an empty tuple, aka TABLE_DEE. The aggregated value of SUM(1) is 1 because there's one row.

   ...oh?...

I don't think that's the question you intended to ask. So, let's rewrite the question to use a U_SUMMARIZE operation.

USING ( A) <| SUMMARIZE R PER ?R2? ADD SUM(1) AS B |>

What are we to use as 'PER ?R2?'? Let's assume that we use TABLE_DEE again (also achieved by omitting the PER clause altogether, I believe), and therefore ACL' is empty.

UNPACK TABLE_DEE ON () is just TABLE_DEE.

So, the U_SUMMARIZE is a shorthand for...

PACK(SUMMARIZE(UNPACK R ON A) PER TABLE_DEE ADD SUM(1) AS B) ON () Any PACK R ON () operation also yields R, hence we have:

SUMMARIZE ( UNPACK R ON A ) PER TABLE_DEE ADD SUM(1) AS B Given the sample data, UNPACK R ON A yields (using your notation with commas in the middle of an interval rather than colons as in TDRM):

RELATION { TUPLE { A [1,1] }, TUPLE { A [2,2] }, TUPLE { A [3,3] },

            TUPLE { A [4,4] }, TUPLE { A [5,5] }, TUPLE { A [6,6] },
            TUPLE { A [7,7] } }

Summarizing that PER TABLE_DEE ADD SUM(1) AS B yields, I think:

RELATION
{ TUPLE { A [1,1], B 1 },

   TUPLE { A [2,2], B 1 },
   TUPLE { A [3,3], B 1 },
   TUPLE { A [4,4], B 1 },
   TUPLE { A [5,5], B 1 },
   TUPLE { A [6,6], B 1 },
   TUPLE { A [7,7], B 1 }

}

I suspect that this means either (1) I've boobed (please be gentle) or (2) I've not used the right U_SUMMARIZE expression. Maybe instead of PER TABLE_DEE, I should be using PER (UNPACK R ON A).

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Received on Thu Dec 02 2004 - 06:27:39 CET

Original text of this message