Re: Idempotence and "Replication Insensitivity" are equivalent ?
Date: 20 Sep 2006 17:09:18 -0700
Message-ID: <1158797358.143842.254890_at_e3g2000cwe.googlegroups.com>
Chris Smith wrote:
> Marshall <marshall.spight_at_gmail.com> wrote:
> > You post was wonderful, as always, but it's a bit narrow in
> > terms of what it can define, and also I think a modestly
> > higher-order approach would simplify things a bit. Also
> > I'm not sure your introduction of the identity element was
> > done in exactly the right manner.
>
> > As to the higher-orderness, I think I would prefer to
> > build everything with generic folds, rather than define
> > a new inductive h() for every g on the way to defining f.
> >
> > fold : (f: (A, B -> B), S : M(A), e : B) -> B
> >
> > fold f {} e = e
> > fold f {a} e = f(a, e)
> > fold f {a union P(A)} e = f(a, fold2(f, P(A), e))
>
> Did you mean fold instead of fold2 there? If not what's fold2? If so,
> read on.
I meant "fold." The "2" was left over from an earlier edit, in
which I was differentiating between two different ways to
do the folds. Which I mention now, why not?
The version of fold that requires an e element is not the
entire story, although it mostly is. There are cases where
we do not have an identity nor a starting point, and so
we'd be in trouble if we required one.
So an alternative version of fold is called for.
Standard fold:
requires a starting point value
has type A, B -> B
if input is {}, returns e
in input is {a}, returns f(a, e)
Alternate fold:
does not take a starting point value
has type A, A -> A
if input is {}, diverges
if input is {a}, returns a
> Okay. So far as I can see, this is equivalent to what I described,
> except that you differentiate between the sets A and B. I admit to
> combining A and B for ulterior motives; I thought it would simplify the
> definition of idempotence of f (my g), and I didn't think we'd lose
> anything since I figured we could just do a union of your A and B, and
> call the result A. I believe I was wrong, because it doesn't allow for
> the case where M(A) is a subset of B, but that was the intent.
>
> > f is invoked a number of times = cardinality of S.
> > If S is empty, fold returns e.
>
> Yes, that was my first mistake.
I wouldn't call it a mistake, given what I said above.
> > f is not required to be commutative and associative,
> > but if it isn't, then it is likely that the result will
> > depend on the order of selection of elemens of
> > S
>
> Right. Given that f is associative and commutative, you are guaranteed
> that fold(f,...) is well-defined. (Proof is fairly simple by
> induction.) Also, in order for fold(f,...) to be well-defined, it is
> necessary that *either* f must be associative and commutative, *or* e
> cannot be an identity of f. (Proof: Let e be an identity of f. First
> consider possible evaluations of fold(f,{a,b},e). They must be equal to
> each other, so f must be commutative in order for fold(f,...) to be
> well-defined. Now consider fold(f,{a,b,c}). Using the commutative
> property as established above, f(f(a,b),c) must be equal to f(a,f(b,b)),
> so f is associative as well.) That is still not a sufficient condition
> for well-definedness in the case where e is not an identity, but I don't
> see an easy sufficient condition.
I'm not really following this. Your either/or above is supposed to be an exclusive or?
> > , and in such cases it would be better to
> > consider S : [A] rather than M(A).
>
> Whether it's better to consider that or not depends on what we're trying
> to do. If we are trying to characterize the results of an aggregate
> function on a relation attribute, then it's actually quite irrelevant, I
> would think.
Given that relation elements are unordered by definition, yes. But we may also wish to perform aggregates on ordered relations. (That is, on a (relation, total order) pair.)
> > If f is commutative and associative, as usual, and
> > also idempotent, then a fold over f will produce the
> > same result, regardless of any repeated values in S.
>
> Yes. That's what was shown earlier (with a different definition, but
> one that's equivalent under those conditions). Furthermore, if fold is
> definable as a well-defined function where e is an identity of f, then f
> is idempotent (and associative, and commutative).
>
> > Once we have fold, we need a mechanism whereby
> > the repeated evaluations of the aggregate function
> > on the scalars in a relation can be translated into a
> > fold over the multiset of all the scalars. I'll just wave
> > my hands for that part.
>
> What seems like "hand-waving" here? The act of extracting a column of a
> relation as a multiset? I don't see why that requires any additional
> clarification.
That's it. I'm glad it was obvious.
> > Note that this doesn't cover all aggregates. We can't
> > do avg() with this approach. We could do these kinds
> > of aggregates with an accumulator, but then we'd
> > need some mechanism to apply a function to the
> > components of the accumulator to get the desired
> > result at the end.
>
> Indeed. In fact, avg() needs two evaluations: one to count the
> elements, and another to add them up. I spoke a friend in the local CS
> department here last night, and he proposed the definition you are
> responding to (which defines primitive aggregate functions, and then
> defines aggregate functions as simple combinations of the values of
> primitive aggregate functions) mainly in order to solve this problem.
There are other useful results from this model than just solving the "problem of avg()" though!
> > Count distinct is a bit of a bastard.
>
> As Bob mentioned, the distinct operator is more of a transformation on a
> projection of the relation than a transformation on an aggregate
> function. Although technically possible, I don't think it does anyone
> any good to define it as an aggregate function.
It does seem perverse.
Marshall Received on Thu Sep 21 2006 - 02:09:18 CEST