Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: Chris Smith <cdsmith_at_twu.net>
Date: Wed, 20 Sep 2006 09:39:49 -0600
Message-ID: <MPG.1f7b0eb31fb9d6f7989721_at_news.altopia.net>


William Hughes <wpihughes_at_hotmail.com> wrote:
> OK. You are following Marshall's nomenclature. Presumably
> Marshal has some reason for restricing aggregate functions.
> But you do not seem to know what this is.

My assumption is that Marshall is restricting aggregate functions (I think it's clear, now, that except for mistakes on my part, it's really no restriction at all) because it's generally done that way. That's the same reason that I had an understanding of what he meant when he did so.

> As has been made clear the statement is fine as long as
> we restrict the concept of aggregate function. So the
> qestion of why we do this becomes central.

Well, the statement is obviously true for some restrictions of aggregate functions. It is not, however, true for all such restrictions. In particular, it isn't true for the definition that we seem to have agreed on at the moment. (Specifically, it is guaranteed to be true when the initial value x_0 of the binary function is an identity; but Pamela gave a counterexample for the case where x_0 is not an identity of the binary function.) As far as I can see, that's where we are now.

> That the study of functions can be writting in the form
> g(x1,g(x2,g(x3,...g(xn,e)) is standard is reasonable. To suggest that
> only these functions are interesting is not.

I'm no longer of the opinion that it makes sense to distinguish between functions that "can" or "can't" be written that way. I can only distinguish between those functions that can or can't be written that way in a manner that uses constant space, or for that matter any other space complexity.

For example, this web page talks about implementing median as an aggregate function in PostgreSQL. PostgreSQL follows the definition given here. (Perhaps that's part of the reason you're looking for to define aggregate functions that way? You don't seem to be too interested in the fact that it's the normal way to do things.) To translate a bit, its "sfunc" or "state transition function" is my "g". Its "stype" or "state type" is Marshall's B. Its "finalfunc" represents the ability to do post-processing on the results of the aggregate function. (I made the distinction between a primitive aggregate function and a general aggregate to allow this post-processing.) PostgreSQL's "basetype" is Marshall's A. The x_0 value is always NULL, which is a kind of special value in SQL.

http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html

The details are hidden, because the procedural language PL/R has a "median" function that computes the median of an array, once the values have been collected. PL/R also provides a function called "plr_array_accum" which represents the binary function g(P,b) = P \uplus {b}, where \uplus preserves duplicates when it merges the two multisets. That does all the hard work for us, but this is a definition of median that fits the most recent definition given here.

> Here we have no restriction whatsoever on the output of an aggregate
> function. This seems much more reasonable than your very
> restrictive definition in which an aggregate function could only map
> into A. Still there is a second more fundamental issue. What could
> the binary form of a function that find the largest five elements be?

This shouldn't be hard. A is the type of the values being aggregated, and B is the result type of the aggregate function. We need one special value representing negative infinity, so I'll define a new set D as the real numbers augmented with negative infinity.

Now let B = D^5. x_0 is (-inf, -inf, -inf, -inf, -inf). The binary function g is defined as such. g(P,x) = P, if x is less than or equal to all members of P. Otherwise, g(P,x) is the result of replacing a minimal member of P with x.

> Only partially. It does remove the difficulty I mentioned. However,
> even after making the change (which involves changing the range of g)
> you still can not produce a copy function (look at the domain of g).

B      = M(A)
x_0    = {}
g(P,x) = P \uplus {x}

-- 
Chris Smith
Received on Wed Sep 20 2006 - 17:39:49 CEST

Original text of this message