Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: William Hughes <wpihughes_at_hotmail.com>
Date: 20 Sep 2006 05:26:44 -0700
Message-ID: <1158755204.820667.82110_at_k70g2000cwa.googlegroups.com>


pamelafluente_at_libero.it wrote:
> William Hughes ha scritto:
>
> > That the study of functions can be writting in the form
> > g(x1,g(x2,g(x3,...g(xn,e)) is standard is reasonable. To suggest that
> > only these functions are interesting is not. This is what you
> > seem to be doing by restricting the use of the term aggregate function.
> ...
> ..
> ..
> > No, but this is a data point that suggests the standard usage
> > of aggregate function is non-restrictive.
> ...
> >
> > Here we have no restriction whatsoever on the output of an aggregate
> > function. This seems much more reasonable than your very
> > restrictive definition in which an aggregate function could only map
> > into A. Still there is a second more fundamental issue. What could
> > the binary form of a function that find the largest five elements be?
> >
> ..
> > > > Yes, Median cannot be put into the form of your
> > > > aggregate function. Despite this median is very important
> > > > and can be computed efficiently.
> > >
> > > Can you clarify what you mean by computed efficiently? Perhaps I'm
> > > wrong again.
> > >
> >
> > Just that it is practical to compute the median routinely. I did
> > not have anything more formal in mind. My point is that
> > the median function is clearly of practical importance,
> > so defining aggregate function in such a way as to exclude
> > the median needs justification.
> >
>
>
> Actually,
>
> Median can be computed O(n) like mean.
> e.g.
> http://valis.cs.uiuc.edu/~sariel/research/CG/applets/linear_prog/median.html
>
> Further, memory is a problem, I could show you at least one algorithm
> which
> compute the median on a stream of data (no need to sort them)
>

[I am aware of a few approaches to the median problem.] As far as the present discussion goes we only need to know that:

           -the median is important
           -it is practical to compute the median
           -the median cannot be computed by iterating
             a binary function

> What is this g(x1,g(x2,g(x3,...g(xn,e)))) why you need that for?
>

Just the functional form of a function that can be evaluated by iterating a binary function. [ Let's have a discussion about the relative merits of functional and non-functional computer programming. On second thought, let's not!]

> If I should code according the above, it would take forever to compute
> things, and then do not complain dbms are slow !

Probably not. Either you or your compiler should convert this to an iterative form.

>
> I find a little weird that some db experts are not familiar with
> computational complexity of the most common functions.
>
> The discussion about what is an aggregate function probably should not
>
> be done on a pure theoretical perpective. Oracle nor I am going to
> define an aggregate function your way. We do the software and it is
> the other way round ;) You must keep up ;)
>

Agreed, but let's not get too hung up on nomenclature. If it is standard to use "aggregate function" in a restricted form so be it (I can make a very good case that "Wilson's Theorem" should not be called "Wilson's Theorem", but I have no inclination to take on the (probably hopeless) task of changing the standard nomenclature). The more important question is "given that as a practical matter, functions on M(A) that cannot be reduced to binary form must be studied, does it make sense to restrict consideration to functions that can be reduced to binary form.?" The default answer is no. The actual answer to may be "yes" (after all the functions that can be reduced to a binary form are are clearly an important class of functions), but this requires justification and to this point there has been none.

> If it takes O(log n) instead of O(n) and the user is willing
> to wait a little more because he needs that result, what are you going
> to do ? (Of course, I agree, forget about quadratic complexity, I
> didn't
> reach that far)
>
> Are you going to tell him: "No sorry, I will not put this function
> in my software because someone say that this is not an aggregate
> function" ?
>
> You surely can. But then do not complain if your competitors make more
> money
> than you :)
>
> Common sense would define an aggregate function as any function
> whose best ratio
>
> computational complexity / market request
>
> is under some threshold. :)) (quite a provocation eh? )
>

This would make the definition of aggregate function dependent on the state of the art. Perhaps not what you want. I would define aggregate function to mean any function on M(A). Then you would have "commercially viable aggregate functions", "binary reducible aggregate functions", "interesting aggregate functions" and so on. Note that common sense does not always drive nomenclature.

  • William Hughes
Received on Wed Sep 20 2006 - 14:26:44 CEST

Original text of this message