Re: Real world issue: How to "split" queries in presence of record replication and replication sensitive aggregate functions ?

From: <pamelafluente_at_libero.it>
Date: 16 Sep 2006 03:17:06 -0700
Message-ID: <1158401826.131193.54000_at_e3g2000cwe.googlegroups.com>


Marshall ha scritto:
> The term is idempotent. If the binary form of the aggregate
> function is idempotent, the aggregate will return the same value
> even if values are repeated arbitrarily. Since + is not idempotent,
> sum() is "sensitive" to repeated values. Since binary min *is*
> idempotent, aggregate min() is not "sensitive" to repeated
> values.

I have been reading the definition of idempotence http://en.wikipedia.org/wiki/Idempotent

it says:
A unary operation (i.e., a function), is idempotent if, whenever it is applied twice to any element, it gives the same result as if it were applied once: f maps X to X
f is an idempotent function: f(f(x)) = f(x) for all x,

In my case I mean something different (the function is not unary, aggregate function
works on several values). Example:

count distinct ({ 1,2,5, 7,7,7,7,7, ... }) does not change if I replicate 7 any number of times
for sum or count or avg... this replication does change the result.

So far I cannot see that the above is the same as the idempotence definition I read on wiki. But please if you have a definition that shows that please let me know. I do want to use the right term for this concept.

Your statement "If the binary form of the aggregate function is idempotent, the aggregate will return the same value even if values are repeated arbitrarily" is not wrong, but is not equivalent to my definition of a "replication sensitive function (RS)".

Your statement is the same as to say:

   Idempotence ==> not replication sensitive

But it does not imply the other way round. I am saying that we need double implication for equivalence: <==> . So you statement is true but does not seem to be the definition of what I am calling a "replication sensitive function".

 It's important we clarify terminology to be sure we are talking of the same thing. Please
feel free to correct me if I am wrong.

> If I haven't understood your question correctly, it would
> be nice if you could construct a *minimal* example for
> your question, and we could start from there.

Ok I made a proposal before and that could be used. If you want to make one
or more please do. I would be happy to try to apply my empirical idea to that
so that you can eventually say that it makes sense or not to you.

> Anyway, to the extent that I understand your question, the
> answer is either trivial or impossible as the case may be.

 What I am saying is that when there are RS such functions as outcome of a report, there is sometimes (depending on the relationships) the need to split the query in a Union of subqueries, because the presence of replicated values may alterate the result.

 All leader reporting softwares are capable to generate such subqueries. I am looking for a systematic method to generate those subqueries. I am not inventing the problem: there is no doubt that the problem is there. I just want to understand how to provide the answer.

 Nature of data is unimportant, semantic is unimportant, "right" design is unimportant. The only thing that matter for this problem is to be able to provide a *formally* correct report, whatever the design is. Formally correct mean that computation of functions will be right and not wrong due to value replication. Whatever is the design, wrong or right that might be. It's actually * better if you propose highly pathological designs * because there we can test better our logic.

In the example I made above, you see that thw first query (that would for instance be generated by access gives a wrong result. The second one, which would be generated by Business Object, seems a right answer).

-P Received on Sat Sep 16 2006 - 12:17:06 CEST

Original text of this message