Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Computing truncated mean

Re: Computing truncated mean

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 7 Feb 2007 11:51:44 -0800
Message-ID: <1170877904.543140.315080@j27g2000cwj.googlegroups.com>


On Feb 7, 2:32 pm, "John K. Hinsdale" <h..._at_alma.com> wrote:
> On Feb 7, 12:51 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
>
> > Your clarification helps. What we need to do is something like this,
> > which is similar to the PERCENT_RANK(), but 1 is added to the COUNT of
> > the records:
>
> > SELECT
> > N,
> > (ROW_NUMBER() OVER (ORDER BY N))/(COUNT(*) OVER (PARTITION BY 1)+1)
> > POSITION_RANKING
>
> ... OK I think that is the best way. Works, & fast.
>
> [BTW, using just COUNT(*), instead of COUNT(*)+1, in the
> denominatorseems more intuitive to me. E.g. for ten rows
> the "rank" would rise in increments of 0.1 and not 1/9.
> In this case you would need to use the comparisons ">"
> and "<=" instead of BETWEEN. However I cannot find any cases
> for which the two methods produce differente results, at
> least for rowsets of size N or larger when you are "trimming"
> 1/N of rows on either end.]

The problem here is that we need the first row to be the same distance from 0 as the last row is from 1, and the same distance between each row, which is where the PERCENT_RANK() analytic function failed (last row is assigned a value of 1.0). By taking COUNT(*)+1, I am actually incrementing by 1/(10+1) = 1/11 = 0.090909091 for each row.

Just to keep things interesting:
INSERT INTO T1 VALUES (8); SELECT
  N,
  (ROW_NUMBER() OVER (ORDER BY N))/(COUNT(*) OVER (PARTITION BY 1)+1) POSITION_RANKING
FROM
  T1;

         N POSITION_RANKING

---------- ----------------
         1       .083333333
         2       .166666667
         3              .25
         4       .333333333
         5       .416666667
         8               .5
        10       .583333333
        10       .666666667
        10              .75
        10       .833333333
        10       .916666667

SELECT
   AVG(N) AVERAGE,
   COUNT(*) NUM_RECORDS,
   MIN(N) MINIMUM,
   MAX(N) MAXIMUM,
   PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) MIDPOINT_1,    PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N DESC) MIDPOINT_2 FROM
  (SELECT
    N,
    (ROW_NUMBER() OVER (ORDER BY N))/(COUNT(*) OVER (PARTITION BY 1)+1) POSITION_RANKING
  FROM
    T1)
WHERE
  POSITION_RANKING BETWEEN .20 AND .80;    AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2 ---------- ----------- ---------- ---------- ---------- ---------- 7.14285714 7 3 10 8 8

INSERT INTO T1 VALUES (11);
INSERT INTO T1 VALUES (12);    AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2 ---------- ----------- ---------- ---------- ---------- ---------- 7.77777778 9 3 10 10 10

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Wed Feb 07 2007 - 13:51:44 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US