Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: Marshall <marshall.spight_at_gmail.com>
Date: 19 Sep 2006 02:21:07 -0700
Message-ID: <1158657667.312714.107970_at_m73g2000cwd.googlegroups.com>


Chris Smith wrote:
> Well,
>
> Now that we've got this far, I have to go back and change the problem.

Chris,

You post was wonderful, as always, but it's a bit narrow in terms of what it can define, and also I think a modestly higher-order approach would simplify things a bit. Also I'm not sure your introduction of the identity element was done in exactly the right manner.

As to the higher-orderness, I think I would prefer to build everything with generic folds, rather than define a new inductive h() for every g on the way to defining f.

  fold : (f: (A, B -> B), S : M(A), e : B) -> B

  fold f {} e = e
  fold f {a} e = f(a, e)
  fold f {a union P(A)} e = f(a, fold2(f, P(A), e))

f is invoked a number of times = cardinality of S. If S is empty, fold returns e. f is not required to be commutative and associative, but if it isn't, then it is likely that the result will depend on the order of selection of elemens of S, and in such cases it would be better to consider S : [A] rather than M(A). (That is, the collection S must be ordered.) Fold
returns a value of type B.

If f is commutative and associative, as usual, and also idempotent, then a fold over f will produce the same result, regardless of any repeated values in S.

Once we have fold, we need a mechanism whereby the repeated evaluations of the aggregate function on the scalars in a relation can be translated into a fold over the multiset of all the scalars. I'll just wave my hands for that part.

  sum(x) = fold(+, all_values_of(x), 0)

Note that this doesn't cover all aggregates. We can't do avg() with this approach. We could do these kinds of aggregates with an accumulator, but then we'd need some mechanism to apply a function to the components of the accumulator to get the desired result at the end. I vaguely perceive that that's the model postgres uses for aggregates; that there's a final combining function. But it's simpler just to be able to write aggregates that are expressions written on other aggregates, as you've mentioned already.

  avg(x) = sum(x) / count(x)

Count distinct is a bit of a bastard. It can be done with an f as follows:

  count_helper(x:A, S:{A}) = {x} union S

and then using {} as e. Then the aggregate form is

  count_distinct(x) = cardinality(count_helper(x))

But mostly I ignore count distinct, because it pretty much only exists as a workaround for how poorly SQL expressions compose. If you had a more orthogonal language, you'd just project and then count, instead of having the rigamarole of count distinct vs. count.

Now, it's quite late and my math skills are not up to the level of the subset of sci.math denizens that actually do math, (as opposed to try to refute Cantor, or string theory, or agitate about Pluto's status, or whatever; sci.math seems to be a magnet for nutballs) so I may be backpedaling shortly. :-) It's past 2AM and I'm still not sleepy, so I'm going to go have some cereal and who knows? Maybe there's a new Robot Chicken episode on the TiVo.

Marshall Received on Tue Sep 19 2006 - 11:21:07 CEST

Original text of this message