Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: Chris Smith <cdsmith_at_twu.net>
Date: Tue, 19 Sep 2006 14:25:07 -0600
Message-ID: <MPG.1f7a0009fa7e3c3398971b_at_news.altopia.net>


Marshall <marshall.spight_at_gmail.com> wrote:
> You post was wonderful, as always, but it's a bit narrow in
> terms of what it can define, and also I think a modestly
> higher-order approach would simplify things a bit. Also
> I'm not sure your introduction of the identity element was
> done in exactly the right manner.

> As to the higher-orderness, I think I would prefer to
> build everything with generic folds, rather than define
> a new inductive h() for every g on the way to defining f.
>
> fold : (f: (A, B -> B), S : M(A), e : B) -> B
>
> fold f {} e = e
> fold f {a} e = f(a, e)
> fold f {a union P(A)} e = f(a, fold2(f, P(A), e))

Did you mean fold instead of fold2 there? If not what's fold2? If so, read on.

Okay. So far as I can see, this is equivalent to what I described, except that you differentiate between the sets A and B. I admit to combining A and B for ulterior motives; I thought it would simplify the definition of idempotence of f (my g), and I didn't think we'd lose anything since I figured we could just do a union of your A and B, and call the result A. I believe I was wrong, because it doesn't allow for the case where M(A) is a subset of B, but that was the intent.

> f is invoked a number of times = cardinality of S.
> If S is empty, fold returns e.

Yes, that was my first mistake.

> f is not required to be commutative and associative,
> but if it isn't, then it is likely that the result will
> depend on the order of selection of elemens of
> S

Right. Given that f is associative and commutative, you are guaranteed that fold(f,...) is well-defined. (Proof is fairly simple by induction.) Also, in order for fold(f,...) to be well-defined, it is necessary that *either* f must be associative and commutative, *or* e cannot be an identity of f. (Proof: Let e be an identity of f. First consider possible evaluations of fold(f,{a,b},e). They must be equal to each other, so f must be commutative in order for fold(f,...) to be well-defined. Now consider fold(f,{a,b,c}). Using the commutative property as established above, f(f(a,b),c) must be equal to f(a,f(b,b)), so f is associative as well.) That is still not a sufficient condition for well-definedness in the case where e is not an identity, but I don't see an easy sufficient condition.

> , and in such cases it would be better to
> consider S : [A] rather than M(A).

Whether it's better to consider that or not depends on what we're trying to do. If we are trying to characterize the results of an aggregate function on a relation attribute, then it's actually quite irrelevant, I would think.

> If f is commutative and associative, as usual, and
> also idempotent, then a fold over f will produce the
> same result, regardless of any repeated values in S.

Yes. That's what was shown earlier (with a different definition, but one that's equivalent under those conditions). Furthermore, if fold is definable as a well-defined function where e is an identity of f, then f is idempotent (and associative, and commutative).

> Once we have fold, we need a mechanism whereby
> the repeated evaluations of the aggregate function
> on the scalars in a relation can be translated into a
> fold over the multiset of all the scalars. I'll just wave
> my hands for that part.

What seems like "hand-waving" here? The act of extracting a column of a relation as a multiset? I don't see why that requires any additional clarification.

> Note that this doesn't cover all aggregates. We can't
> do avg() with this approach. We could do these kinds
> of aggregates with an accumulator, but then we'd
> need some mechanism to apply a function to the
> components of the accumulator to get the desired
> result at the end.

Indeed. In fact, avg() needs two evaluations: one to count the elements, and another to add them up. I spoke a friend in the local CS department here last night, and he proposed the definition you are responding to (which defines primitive aggregate functions, and then defines aggregate functions as simple combinations of the values of primitive aggregate functions) mainly in order to solve this problem.

> Count distinct is a bit of a bastard.

As Bob mentioned, the distinct operator is more of a transformation on a projection of the relation than a transformation on an aggregate function. Although technically possible, I don't think it does anyone any good to define it as an aggregate function.

-- 
Chris Smith
Received on Tue Sep 19 2006 - 22:25:07 CEST

Original text of this message