| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> c.d.o.server -> Re: Computing truncated mean
On Feb 7, 7:50 am, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> On Feb 6, 4:40 pm, Walt <walt_ask..._at_SHOESyahoo.com> wrote:
>
> > A truncated mean or a trimmed average is the average after throwing away
> > the top and bottom n percent.
>
> > I seem to recall a built-in function in Oracle that did this fairly
> > simply, but I can't seem to locate it in the docs. I could cook up my
> > own in PL-SQL, but if there's something already out there I'd prefer to
> > use it. Anybody know a simple way to do it?
>
> > Thanks.
>
> > //Walt
>
> > W2k3, Ora 9.2 if that matters...
>
> Set up a test case:
> CREATE TABLE T1 (N NUMBER(10));
>
> INSERT INTO T1 VALUES (1);
> INSERT INTO T1 VALUES (2);
> INSERT INTO T1 VALUES (8);
> INSERT INTO T1 VALUES (10);
> INSERT INTO T1 VALUES (15);
> INSERT INTO T1 VALUES (20);
> INSERT INTO T1 VALUES (34);
> INSERT INTO T1 VALUES (40);
> INSERT INTO T1 VALUES (55);
> INSERT INTO T1 VALUES (18);
> INSERT INTO T1 VALUES (22);
> INSERT INTO T1 VALUES (28);
> INSERT INTO T1 VALUES (32);
> INSERT INTO T1 VALUES (40);
> INSERT INTO T1 VALUES (49);
> INSERT INTO T1 VALUES (50);
> INSERT INTO T1 VALUES (80);
> INSERT INTO T1 VALUES (200);
> INSERT INTO T1 VALUES (201);
> INSERT INTO T1 VALUES (210);
>
> SELECT
> *
> FROM
> T1
> ORDER BY
> N;
>
> N
> ----------
> 1
> 2
> 8
> 10
> 15
> 18
> 20
> 22
> 28
> 32
> 34
> 40
> 40
> 49
> 50
> 55
> 80
> 200
> 201
> 210
>
> Now, an experiement to find the point in our table that represents the
> bottom 20% (START_POINT) and top 20% (END_POINT):
> SELECT
> PERCENTILE_DISC(0.20) WITHIN GROUP (ORDER BY N) START_POINT,
> PERCENTILE_DISC(0.20) WITHIN GROUP (ORDER BY N DESC) END_POINT
> FROM
> T1
> GROUP BY
> 1;
>
> START_POINT END_POINT
> ----------- ----------
> 10 80
>
> Now that we have the end points defined, we need a way to limit the
> input into the AVG function so that only the rows that contain column
> values between the two extremes are retrieved. Sliding the above into
> an inline view and joining it to our original table solves the
> problem:
> SELECT
> AVG(N) AVERAGE,
> COUNT(*) NUM_RECORDS,
> MIN(N) MINIMUM,
> MAX(N) MAXIMUM,
> PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) MIDPOINT_1,
> PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N DESC) MIDPOINT_2
> FROM
> T1,
> (SELECT
> PERCENTILE_DISC(0.2) WITHIN GROUP (ORDER BY N) START_POINT,
> PERCENTILE_DISC(0.2) WITHIN GROUP (ORDER BY N DESC) END_POINT
> FROM
> T1
> GROUP BY
> 1) L
> WHERE
> T1.N BETWEEN L.START_POINT AND L.END_POINT;
>
> AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2
> ---------- ----------- ---------- ---------- ---------- ----------
> 35.2142857 14 10 80 32 34
>
> After throwing out those values outside the 20% to 80% range, our
> average is 35.21, there are 14 records, the minimum is 10, maximum is
> 80, the first number in the middle of the group sorted from lowest to
> highest is 32, and the first number in the middle of the group sorted
> from highest to lowest is 34.
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.
Continuing with another example that returns a different result (which
right answer do you want?):
SELECT
N,
PERCENT_RANK() OVER (ORDER BY N) RANKING
FROM
T1
ORDER BY
N;
N RANKING
---------- ----------
2 .052631579
8 .105263158
10 .157894737
15 .210526316
18 .263157895
20 .315789474
22 .368421053
28 .421052632
32 .473684211
34 .526315789
40 .578947368
40 .578947368
49 .684210526
50 .736842105
55 .789473684
80 .842105263
200 .894736842
201 .947368421
210 1
You will note now that the between 20% and 80% rule will include those
values between 15 and 55, if we use the above as a row limiter:
SELECT
AVG(N) AVERAGE,
COUNT(*) NUM_RECORDS,
MIN(N) MINIMUM,
MAX(N) MAXIMUM,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) MIDPOINT_1,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N DESC) MIDPOINT_2
FROM
(SELECT
N,
PERCENT_RANK() OVER (ORDER BY N) RANKING
FROM
T1)
WHERE
RANKING BETWEEN .20 AND .80;
AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2
---------- ----------- ---------- ---------- ---------- ----------
33.5833333 12 15 55 32 34
After throwing out those values outside the 20% to 80% range {values in () indicate previous results}, our average is 33.58 (35.21), there are 12 (14) records, the minimum is 15 (10), maximum is 55 (80), the first number in the middle of the group sorted from lowest to highest is 32 (32), and the first number in the middle of the group sorted from highest to lowest is 34 (34).
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
Received on Wed Feb 07 2007 - 07:08:43 CST
![]() |
![]() |