Re: median value

From: Rob Johnson <jaws_at_southwind.net>
Date: 1996/12/14
Message-ID: <32B33191.79F2_at_southwind.net>#1/1


Try this, where the table is table1 and the column is col1:

select avg(a.col1)
from
table1 a, table1 b
having
 (ABS(SUM(SIGN(a.col1 - b.col1))) - SUM(DECODE(b.col1,a.col1,1,0)) <= 0) group by a.col1

I tested this on a small table, and it worked OK. Here are some examples of how it works. Note that each column value will appear only once, because we are grouping by column values, and the ABS(...),SUM(...), and (ABS(...) - SUM(...)) values are group by values.

Example 1: column values are 1,2,3,4,24

VALUE of a.col1                        1        2        3       
4       24
ABS(SUM(SIGN(a.col1-b.col1)))          4        2        0       
2        4
SUM(DECODE(b.col1,a.col1,1,0))         1        1        1       
1        1
ABS(...) - SUM(...)                    3        1       -1       
1        3
selected for avg function?             N        N        Y       
N        N

avg(a.col1) = 3, the median value

Example 2: column values are 1,2,3,10

VALUE of a.col1                        1        2        3       10 
ABS(SUM(SIGN(a.col1-b.col1)))          3        1        1        3
SUM(DECODE(b.col1,a.col1,1,0))         1        1        1        1
ABS(...) - SUM(...)                    2        0        0        2
selected for avg function?             N        Y        Y        N

avg(a.col1) = (3 + 2)/2 = 2.5, the median value

Example 3: column values are 1,2,2,3,4,5,6

VALUE of a.col1                        1       2       3       4     
5      6
ABS(SUM(SIGN(a.col1-b.col1)))          6      10       0       2     
4      6
SUM(DECODE(b.col1,a.col1,1,0))         1       4       1       1     
1      1
ABS(...) - SUM(...)                    5       8      -1       1     
3      5
selected for avg function?             N       N       Y       N     
N      N

avg(a.col1) = 3, the median value. Note here that ABS(...) = 10 for the value 2, because a.col1 = 2 twice on the list. SUM(...)=4 for a.col1=2 for the same reason, because 2 appears twice on the list of column values. In general,

ABS(SUM(SIGN(a.col1-b.col1)) = ABS(#elements greater than a.col1 -

                                     #elements less than a.col1)
                                   *
                                 (#times a.col1 appears on the list)

and

SUM(DECODE(b.col1,a.col1,1,0)) = (#times a.col1 appears on list squared)

Example 4: column values are 1,5,5,10,10,10

VALUE of a.col1                       1         5         10
ABS(SUM(SIGN(a.col1-b.col1)))         5         4          9
SUM(DECODE(b.col1,a.col1,1,0))        1         4          9
ABS(...) - SUM(...)                   4         0          0
selected for avg function?            N         Y          Y

avg(a.col1) = (5 + 10)/2 = 7.5, the median value.

While this algorithm works, it involves n squared comparisons for a table with n values, which may be too many for a large table. If there's a way to do it, the better thing to do would be to sort the rows by col1 and then find the one(s) in the middle of the sorted list, because sorting is an (nlogn)-complexity algorithm and would be much faster.

But I don't know how to do that. :)

Lisa Kilroy wrote:
>
> Ron wrote:
> >
> > Roger Wernersson wrote:
> > >
> > > Lisa Kilroy wrote:
> > > > I need to pull the median value out of a table for a particular field.
> > >
> > > Hello Lisa, no problem!
> > >
> > > SELECT MAX (value)
> > > FROM table
> > > WHERE value <= (
> > > SELECT AVG (value)
> > > FROM table
> > > )
> > >
> > > This might not be the fastest way but it works.
> > >
> > > --
> > > Sport radio: people listening to people watching people having fun
> > > Mailto:roger.wernersson_at_adra.se
> > > BTW: All opinions are mine, all mine, and nobody's but mine.
> >
> > Nice try but this won't work. Example set of values:
> > 3, 3, 3, 1
> > The average is 2.5 and the max(value) = 1. The median is 3.
> >
> > Ron
Received on Sat Dec 14 1996 - 00:00:00 CET

Original text of this message