Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Computing truncated mean

Re: Computing truncated mean

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 7 Feb 2007 05:08:43 -0800
Message-ID: <1170853723.837196.173910@h3g2000cwc.googlegroups.com>


On Feb 7, 7:50 am, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> On Feb 6, 4:40 pm, Walt <walt_ask..._at_SHOESyahoo.com> wrote:
>
> > A truncated mean or a trimmed average is the average after throwing away
> > the top and bottom n percent.
>
> > I seem to recall a built-in function in Oracle that did this fairly
> > simply, but I can't seem to locate it in the docs. I could cook up my
> > own in PL-SQL, but if there's something already out there I'd prefer to
> > use it. Anybody know a simple way to do it?
>
> > Thanks.
>
> > //Walt
>
> > W2k3, Ora 9.2 if that matters...
>
> Set up a test case:
> CREATE TABLE T1 (N NUMBER(10));
>
> INSERT INTO T1 VALUES (1);
> INSERT INTO T1 VALUES (2);
> INSERT INTO T1 VALUES (8);
> INSERT INTO T1 VALUES (10);
> INSERT INTO T1 VALUES (15);
> INSERT INTO T1 VALUES (20);
> INSERT INTO T1 VALUES (34);
> INSERT INTO T1 VALUES (40);
> INSERT INTO T1 VALUES (55);
> INSERT INTO T1 VALUES (18);
> INSERT INTO T1 VALUES (22);
> INSERT INTO T1 VALUES (28);
> INSERT INTO T1 VALUES (32);
> INSERT INTO T1 VALUES (40);
> INSERT INTO T1 VALUES (49);
> INSERT INTO T1 VALUES (50);
> INSERT INTO T1 VALUES (80);
> INSERT INTO T1 VALUES (200);
> INSERT INTO T1 VALUES (201);
> INSERT INTO T1 VALUES (210);
>
> SELECT
> *
> FROM
> T1
> ORDER BY
> N;
>
> N
> ----------
> 1
> 2
> 8
> 10
> 15
> 18
> 20
> 22
> 28
> 32
> 34
> 40
> 40
> 49
> 50
> 55
> 80
> 200
> 201
> 210
>
> Now, an experiement to find the point in our table that represents the
> bottom 20% (START_POINT) and top 20% (END_POINT):
> SELECT
> PERCENTILE_DISC(0.20) WITHIN GROUP (ORDER BY N) START_POINT,
> PERCENTILE_DISC(0.20) WITHIN GROUP (ORDER BY N DESC) END_POINT
> FROM
> T1
> GROUP BY
> 1;
>
> START_POINT END_POINT
> ----------- ----------
> 10 80
>
> Now that we have the end points defined, we need a way to limit the
> input into the AVG function so that only the rows that contain column
> values between the two extremes are retrieved. Sliding the above into
> an inline view and joining it to our original table solves the
> problem:
> SELECT
> AVG(N) AVERAGE,
> COUNT(*) NUM_RECORDS,
> MIN(N) MINIMUM,
> MAX(N) MAXIMUM,
> PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) MIDPOINT_1,
> PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N DESC) MIDPOINT_2
> FROM
> T1,
> (SELECT
> PERCENTILE_DISC(0.2) WITHIN GROUP (ORDER BY N) START_POINT,
> PERCENTILE_DISC(0.2) WITHIN GROUP (ORDER BY N DESC) END_POINT
> FROM
> T1
> GROUP BY
> 1) L
> WHERE
> T1.N BETWEEN L.START_POINT AND L.END_POINT;
>
> AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2
> ---------- ----------- ---------- ---------- ---------- ----------
> 35.2142857 14 10 80 32 34
>
> After throwing out those values outside the 20% to 80% range, our
> average is 35.21, there are 14 records, the minimum is 10, maximum is
> 80, the first number in the middle of the group sorted from lowest to
> highest is 32, and the first number in the middle of the group sorted
> from highest to lowest is 34.
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.

Continuing with another example that returns a different result (which right answer do you want?):
SELECT
  N,
  PERCENT_RANK() OVER (ORDER BY N) RANKING FROM
  T1
ORDER BY
  N;

         N RANKING
---------- ----------

         2 .052631579
         8 .105263158
        10 .157894737
        15 .210526316
        18 .263157895
        20 .315789474
        22 .368421053
        28 .421052632
        32 .473684211
        34 .526315789
        40 .578947368
        40 .578947368
        49 .684210526
        50 .736842105
        55 .789473684
        80 .842105263
       200 .894736842
       201 .947368421
       210          1

You will note now that the between 20% and 80% rule will include those values between 15 and 55, if we use the above as a row limiter: SELECT
  AVG(N) AVERAGE,
  COUNT(*) NUM_RECORDS,
  MIN(N) MINIMUM,
  MAX(N) MAXIMUM,
  PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) MIDPOINT_1,   PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N DESC) MIDPOINT_2 FROM
  (SELECT
    N,
    PERCENT_RANK() OVER (ORDER BY N) RANKING   FROM
    T1)
WHERE
  RANKING BETWEEN .20 AND .80;    AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2 ---------- ----------- ---------- ---------- ---------- ---------- 33.5833333 12 15 55 32 34

After throwing out those values outside the 20% to 80% range {values in () indicate previous results}, our average is 33.58 (35.21), there are 12 (14) records, the minimum is 15 (10), maximum is 55 (80), the first number in the middle of the group sorted from lowest to highest is 32 (32), and the first number in the middle of the group sorted from highest to lowest is 34 (34).

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Wed Feb 07 2007 - 07:08:43 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US