Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: Chris Smith <cdsmith_at_twu.net>
Date: Tue, 19 Sep 2006 16:17:04 -0600
Message-ID: <MPG.1f7a1a3fd8add32a98971f_at_news.altopia.net>


William Hughes <wpihughes_at_hotmail.com> wrote:
> First of all you don't want an aggregate function to be any
> function on M(A) (which is how Oracle apparently
> defines them). Okay call such functions turquoise functions.
> You define, without motivation, a subclass
> of turquoise functions which you call the aggregate functions.
> Are these supposed to be the efficiently computable turqoise
> functions? If so what is an efficient aggregate function?

Intuitively, they are those functions which can be evaluated incrementally by examining a stream of passing data, without the ability to "go back" and look at one result after one has already moved on and looked at the next one. Yes, the functions are defined that way because it's an efficient way to do things. The comment here is that variance can be done that way (a surprise to me, but I'm no great student of statistics). Median, for example, cannot. (Median is widely cited as the most popular kind of summary data that cannot be computed by an aggregate function, so I'm fairly confident in asserting that fact without proof.)

The confusion here is that it was proposed that one could (under the definitions expressed so far) write an aggregate function that makes a copy of the data, and then does whatever calculation it likes after it's got all the data. Of course, that makes any function on the multiset possible, but it's not a useful kind of possible.

-- 
Chris Smith
Received on Wed Sep 20 2006 - 00:17:04 CEST

Original text of this message