Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: Marshall <marshall.spight_at_gmail.com>
Date: 21 Sep 2006 09:46:16 -0700
Message-ID: <1158857176.086894.324840_at_k70g2000cwa.googlegroups.com>


pamelafluente_at_libero.it wrote:
> Chris Smith ha scritto:
>
> > Marshall <marshall.spight_at_gmail.com> wrote:
>
> > I suppose no one could say you were wrong, but then COUNT would become
> > impossible to define under such a system.

[reordering]
[-sci.math]

> While this whole discussion about properties of functions
> defined is some (involved) way is fashinating and may have some pure
> math interest,
>
> I believe that to the purpose of dbms aggregate functions
> it is of little importance.
>
> As I view it, the main concern, if any, should be put on computational
> aspect.

I think you have it backwards. The model for aggregates that I presented
has little or no mathematical interest. Rather it exists for the purposes
of facilitating efficient implementations of aggregates. It's main concern
is the computational aspect.

> What I think I have heard let's not consider COUNTDISTINCT an aggregate
> function just because it does not fit in our fashinating math
> contruction would make laugh any dbms user ...

Count distinct can certainly be considered an aggregate. There is the question of whether it is a good choice for an abstraction given that any relational language will have count and necessarily has project. And there is the answer to that question, which is "no."

As I said before, SQL only has count distinct because it is so poor at composing expressions.

> I think that in practice a DBMS should implement, "by default",
> all the interesting aggregate functions that can be computed "in some
> efficient way".
>
> In addition it must give the possibility to the user, through custom
> code, to specify "any" aggregate function he wishes.

I agree on both counts.

> And here I literally mean anything. Anything that can be coded. It's up
> to the user to spend it's time as he wishes.
>
>
> I would find much more interesting to discuss here what means
> "interesting" and "efficient way" (to decide about the "default
> aggregate functions").
>
> Does it means bounded CC, does it means restrictions to memory,
> does it mean that we should be able to process any stream of data
> without any post processing , ... ?
>
> What is your opinion ?

I think it's awfully context dependent.

Marshall Received on Thu Sep 21 2006 - 18:46:16 CEST

Original text of this message