Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: Marshall <marshall.spight_at_gmail.com>
Date: 15 Sep 2006 18:17:09 -0700
Message-ID: <1158369429.438007.258820_at_b28g2000cwb.googlegroups.com>


pamelafluente_at_libero.it wrote:
>
> Please feel free to replace the following with the *right* english
> terminology.
>
> 1.
> For "aggregate" function (please correct terminology if wrong) I mean
> something like:
>
> sum()
> avg()
> count()
> var()
> or any user function that turns a bunch of values into a single scalar

Yes, these are commonly called aggregate functions.

> This involves a group of record and requires a GROUP BY clause.

It's the other way around, actually: a GROUP BY requires an aggregate function, but an aggregate function does not require a GROUP BY.

> [...]
> "aggregate function" is in some sense the opposite to "row function"

"Opposite" is a poor choice of words. "Dual" maybe. But even that's not correct.

> 2.
> For "Replication sensitive" I mean:
>
> - An aggregate function such that if some value (operand) is
> replicated the result comes out wrong.
>
> For instance:
>
> * Replication sensitive* :
> sum()
> count()
> avg()
> Any Quantile and so on
> ...
>
> * Not Replication sensitive * :
> max()
> min()
> count distinct ()
> anything not influenced by the replication of values

The term is idempotent. If the binary form of the aggregate function is idempotent, the aggregate will return the same value even if values are repeated arbitrarily. Since + is not idempotent, sum() is "sensitive" to repeated values. Since binary min *is* idempotent, aggregate min() is not "sensitive" to repeated values.

> Yes I will. But for ease of discussion and so that anyone here can
> follow and make observations. I propose to take a free standard well
> know database, easy to work with. I suggest to take Northwind for
> ACCESS.
Actually that's not a good idea. When discussing this sort of question, (or most any sort of question, really) it is best to start with *minimal* examples. You want to have the fewest number of tables and the fewest number of attributes to demonstrate the issue. Since you're quite concerned about generality, there is also the issue of whether the minimal example will fully generalize. But if you've chosen your minimal example well, it often will. There's no need to haul the entirety of Northwind into the conversation. So far I've avoided jumping in on the conversation because I've been unable to disentagle your complicated posts.

Anyway, to the extent that I understand your question, the answer is either trivial or impossible as the case may be. If you have an idempotent aggregate, it's a non-issue. If you have a non-idempotent aggregate, and you apply a transformation to a relation but wish to know the value of the aggregate before the transformation, then either your transformation is reversible, in which case you reverse the transformation and apply the aggregate, or else the transformation is not reversible, in which case it's impossible.

If I haven't understood your question correctly, it would be nice if you could construct a *minimal* example for your question, and we could start from there.

Marshall Received on Sat Sep 16 2006 - 03:17:09 CEST

Original text of this message