Re: Idempotence and "Replication Insensitivity" are equivalent ?

From: <pamelafluente_at_libero.it>
Date: 19 Sep 2006 08:12:41 -0700
Message-ID: <1158678761.385503.267010_at_e3g2000cwe.googlegroups.com>


Chris Smith ha scritto:

> > - median
> > - Any quantile
> > - standard deviation
> > - variance
>
> These are not definable as aggregate functions.

 No? Really?

http://www.adp-gmbh.ch/ora/sql/agg/index.html

>
> > - regression coefficient
> > - Pearson's skewness
>
> I don't know what these are.

I believe you.
>
> --
> Chris Smith

  I propose a counterexample, specific to the case of   Marshall's concern:

  Consider a commutative, associative Aggregate function f   "expressible through a a binary generator", in the sense   that it allows the following representation:

  f(x,y,z) = f( f (x,y) , z)

 Assume that we restrict ourself to the space {0, 1}

 Consider f := [Product of operands], which allows the above representation

 We have

1.

    f(x,y) = x

   does not hold, e.g. f(0, 1) = 0

   Therefore f is NOT idempotent

2.

   f is "duplication insensitive". Infact:

   f can take values only in {0, 1}

   if f(.) is 0 this result cannot be changed by any further operands    if f(.) is 1 it means that all operands are 1. Hence the duplication    of an operand (a new 1) does not change the result.

   Therefore we have provided, a case where "NOT idempotent"    and "duplication insensitivity" co-exist.

   Hence "duplication insensitivity" => idempotence does not hold.

   Did I do right?

-P

PS.

1.
Restricting aggregate function to above contraints is in real word dbms' is useless, imho.

I have coded countless times aggegate functions such as:

  • Ranges
  • Quantiles
  • standard deviation / variance
  • regression coefficient
  • any moment (from origin or central)
  • Any statistical estimator and especially "sufficient" statistics ...

 why should we care about requiring the above representation?  People want complex KPIs, who cares about SUM and COUNT ?

2.
avg(x) = sum(x) / count(x)

average is not computed that way

http://www.databasejournal.com/features/mssql/article.php/3399931

"The reason for the different results is that most aggregate functions ignore columns that contain a NULL value, where as the COUNT function does not ignore columns that have a NULL value. Because the AVG function
ignores columns with a NULL value, the row where the color was "White" and
the number value was NULL was not considered when calculating the average for
the color "White". However, since the COUNT function does not ignore columns
that contain a NULL value, the row that contained a NULL value was considered
when the SUM and COUNT functions where used to calculate the average. " Received on Tue Sep 19 2006 - 17:12:41 CEST

Original text of this message