Re: Idempotence and "Replication Insensitivity" are equivalent ?
Date: 19 Sep 2006 16:42:03 -0700
Message-ID: <1158709323.167127.326340_at_i42g2000cwa.googlegroups.com>
Chris Smith wrote:
> William Hughes <wpihughes_at_hotmail.com> wrote:
> > First of all you don't want an aggregate function to be any
> > function on M(A) (which is how Oracle apparently
> > defines them). Okay call such functions turquoise functions.
> > You define, without motivation, a subclass
> > of turquoise functions which you call the aggregate functions.
> > Are these supposed to be the efficiently computable turqoise
> > functions? If so what is an efficient aggregate function?
>
> Intuitively, they are those functions which can be evaluated
> incrementally by examining a stream of passing data, without the ability
> to "go back" and look at one result after one has already moved on and
> looked at the next one.
A couple of vital restrictions.
-only a single pass is allowed
-only a limited amount of memory is allowed.
Moreover, consider the task, "find the five largest elements of this
list".
This can be done in a single pass, using only a small amount
of memory. Yet this is not an aggregate function under
your definition.
>Yes, the functions are defined that way because
> it's an efficient way to do things. The comment here is that variance
> can be done that way (a surprise to me, but I'm no great student of
> statistics).
You don't need to be. The fact that variance is a function of the first couple of moments is elementary. (However, questions of numerical stability muat be dealt with).
>Median, for example, cannot. (Median is widely cited as
> the most popular kind of summary data that cannot be computed by an
> aggregate function, so I'm fairly confident in asserting that fact
> without proof.)
Yes, Median cannot be put into the form of your aggregate function. Despite this median is very important and can be computed efficiently. So your aggregate functions do not contain all the interesting functions, nor all the effeciently computable functions.
Why do you insist on restricting the name "aggregate function"? You have not aswered the question "What is an efficient aggregate function?".
>
> The confusion here is that it was proposed that one could (under the
> definitions expressed so far) write an aggregate function that makes a
> copy of the data, and then does whatever calculation it likes after it's
> got all the data.
No you can't (at least under your definition). An aggregate function maps to A, a copy function maps to M(A).
-William Hughes
Received on Wed Sep 20 2006 - 01:42:03 CEST
