Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Computing truncated mean
On Feb 7, 12:29 pm, Walt <walt_ask..._at_SHOESyahoo.com> wrote:
> Walt wrote:
> > John K. Hinsdale wrote:
> >>> On Feb 7, 7:50 am, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
>
> > Thanks guys. I'm testing your candidate scripts with the following data:
>
> > INSERT INTO T1 VALUES (1);
> > INSERT INTO T1 VALUES (2);
> > INSERT INTO T1 VALUES (3);
> > INSERT INTO T1 VALUES (4);
> > INSERT INTO T1 VALUES (5);
> > INSERT INTO T1 VALUES (10);
> > INSERT INTO T1 VALUES (10);
> > INSERT INTO T1 VALUES (10);
> > INSERT INTO T1 VALUES (10);
> > INSERT INTO T1 VALUES (10);
>
> > A 20% trim should include the middle six values and return a trimmed
> > average of 42/6 == 7.
>
> > So far, the only "formula" I've tried that gives the right answer is the
> > first one John posted (actually a slight modification thereof). The
> > others (see below) appear to have problems. I'll try to understand them
> > further to pinpoint the error.
>
> Ok, here's the deal. Trimming according to cume_dist or percent_ranking
> trims by value, not by record. That is, all the records with n=10 come
> and go as a group - either they're in the sample used to compute the
> average or they're not.
>
> John's first formula trims the top and bottom 20% of records, the others
> trim all records with a value between the 20th percentile and 80th
> percentile. Not the same thing (although it is if the values are
> distinct, as with some of the test data.)
>
> Perhaps this is due to some ambiguity in the way I stated the problem.
> If so, I apologize. A 20% truncated mean is the average over the middle
> 60% of *records*, not over all the records whose *values* fall in the
> the 20th and 80th percentile range.
>
> In my particular situation, I'm crunching some numbers from a survey
> where the only possible responses are 1 through 5. Trimming by value
> will result in all the 1s and 5s disappearing from the average, which is
> not at all what the client is looking for.
>
> //Walt- Hide quoted text -
>
> - Show quoted text -
Your clarification helps. What we need to do is something like this, which is similar to the PERCENT_RANK(), but 1 is added to the COUNT of the records:
SELECT
N,
(ROW_NUMBER() OVER (ORDER BY N))/(COUNT(*) OVER (PARTITION BY 1)+1)
POSITION_RANKING
FROM
T1;
N POSITION_RANKING
---------- ---------------- 1 .090909091 2 .181818182 3 .272727273 4 .363636364 5 .454545455 10 .545454545 10 .636363636 10 .727272727 10 .818181818 10 .909090909
Now, sliding into an inline view like before:
SELECT
AVG(N) AVERAGE,
COUNT(*) NUM_RECORDS,
MIN(N) MINIMUM,
MAX(N) MAXIMUM,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N) MIDPOINT_1,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY N DESC) MIDPOINT_2
FROM
(SELECT
N,
(ROW_NUMBER() OVER (ORDER BY N))/(COUNT(*) OVER (PARTITION BY
1)+1) POSITION_RANKING
FROM
T1)
WHERE
POSITION_RANKING BETWEEN .20 AND .80;
AVERAGE NUM_RECORDS MINIMUM MAXIMUM MIDPOINT_1 MIDPOINT_2
---------- ----------- ---------- ---------- ---------- ----------
7 6 3 10 5 10
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
Received on Wed Feb 07 2007 - 11:51:43 CST