Re: U_SUMMARIZE: how aggregate is calculated?

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Sat, 04 Dec 2004 07:57:18 GMT
Message-ID: <y5esd.991$yr1.428_at_newsread3.news.pas.earthlink.net>


Mikito Harakiri wrote:

> "Jonathan Leffler" <jleffler_at_earthlink.net> wrote:

>>Mikito Harakiri wrote:
>>
>>>Given
>>>
>>>R: {A=[1,2], A= [3,5], A= [4,7]}
>>>
>>>what is the result of
>>>
>>>USING ( A ) <| SUMMARIZE R
>>>PER TABLE_DEE
>>>ADD SUM(1) AS THE_SUM_IN_QUESTION |>
>>
>>I believe the answer to this precisely stated question is
>>
>> RELATION { TUPLE { THE_SUM_IN_QUESTION 7 } }
> 
> 
> OK. I see only one issue here. The result depends on the discretization
> granularity. In the problem description I never mentioned that the intervals
> have been built out of integers. What if they are rationals? Most of the
> queries are invariant to the discretization, however. In other words, the
> queries are supposed to produce the same answer no matter how fine
> discretization is.

These aggregate queries definitely produce different answers depending on the quantum size - or how fine the 'discretization' is. If the RATIONAL type is constrained to 2 dp, for example, then the answer would be 600

>>You are working with a degenerate case - your input relation only has
>>the interval-valued column. Normally, the input relation would have
>>other columns too.
>>
>>The U_SUMMARIZE operation you've specified is a short-hand for:
>>
>>PACK
>> ( SUMMARIZE ( UNPACK R ON ( A ) )
>> PER ( UNPACK TABLE_DEE ON ( ) )
>> ADD SUM(1) AS THE_SUM_IN_QUESTION
>> )
>>ON ( )
>>
>>Unpacking any relation on the empty set of columns doesn't alter the
>>relation, so UNPACK TABLE_DEE ON ( ) is the same as TABLE_DEE.
>>Similarly, the outer PACK is a no-op. So, the overall expression is
>>equivalent to:
>>
>>SUMMARIZE ( UNPACK R ON ( A ) )
>> PER TABLE_DEE
>> ADD SUM(1) AS THE_SUM_IN_QUESTION
>>
>>I'm sure we agree that UNPACK R ON (A) yields a relation with 7
>>tuples, each of which is a unit-interval.

> 
> 
> "Init interval"! Perhaps, this is the answer to my previous concern. Do Date
> et al always use unit discretization, or they allow to perform finer
> discretization (but, then, the result should be normalized to unit
> interval)? Note, that the aggregation on the interval domain is essentially
> Rieman integral definition.
> 
> 

>>When you use SUMMARIZE R1 PER R2, the attributes in R2 must all also
>>appear in R1 (same name, same type). For each tuple in R2, the set of
>>tuples from R1 with the same values as the tuple in R2 are treated as
>>a group, and the aggregate operation runs over all the tuples in the
>>group.
>>
>>Also, the output relation from SUMMARIZE R1 PER R2 consists of the
>>columns from R2 (in this case, no columns) plus the added <summary> -
>>in this case the answer 7. There is one tuple in the output for each
>>tuple in R2. Now, TABLE_DEE clearly has no attributes, and the
>>relation with 7 tuples each containing a unit intervals shares the
>>empty set of attributes with TABLE_DEE, and SUM(1) over those members
>>produces the answer 7.
>>
>>Further, the output relation from SUMMARIZE R1 PER R2 consists of the
>>columns from R2 (in this case, no columns) plus the added <summary> -
>>in this case the answer 7. There is one tuple in the output for each
>>tuple in R2, and TABLE_DEE contains just one tuple, so the result
>>relation also contains just one.
> 
> 
> Oh, the result set relation shape -- one column, one tuple -- was obvious
> from the very beginning. SUMMARIZE definition (by Hugh Darwen?), however,
> involves 2 relations. It's difficult to develop intuition with that kind of
> syntax.
> 
> 

>>>Unlike ordinary SUMMARIZE there is a USING clause. Does it have any
>>>effect at all?
>>
>>Yes; if you evaluated:
>>
>> SUMMARIZE R PER TABLE_DEE ADD SUM(1) AS THE_SUM_IN_QUESTION
>>
>>the answer would be:
>>
>> RELATION { TUPLE { THE_SUM_IN_QUESTION 3 } }
>>
>>because there are three tuples in R.
> 
> 
> Thanks for clarifying this. It looks like significant part of your effort
> has been spent fighting syntax:-)
> 
> 


-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/ Received on Sat Dec 04 2004 - 08:57:18 CET

Original text of this message