Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: computing median

Re: computing median

From: Dave Fowler <dbf57_at_earthlink.net>
Date: Sat, 28 Apr 2001 13:06:51 GMT
Message-ID: <LfzG6.1152$dH5.148583@newsread2.prod.itd.earthlink.net>

Frank,
I think you are mistaken on how Cume_dist is used in my example or maybe it wasn't a clear example.Col_distvalue is the constant. New example: if you want the median salary for a Job_title (the constant). example data
Job_title salary
Analyst 45,000

Analyst     45,000
Analyst       2,000     (Short term contract)
Analyst     55,000
Anaylst     60,000

Median 45,000

Cursor query for loop to find the cume_dist of .5

SELECT job_title,salary,count(*) count
, CUME_DIST() OVER (PARTITION BY job_title ORDER BY salary) AS cume_dist

If you don't have a constant make one up to use with your numbers needing the median value which is what I did.
I doesn't work in your example since their isn't a constant. I made the same mistake first.
Again attention to detail and how the function works.

hth,
Dave Fowler

"Frank Hubeny" <fhubeny_at_ntsource.com> wrote in message news:3AE7BF12.6CCE17B5_at_ntsource.com...
> I would agree that using Excel is not the best way to get the median,
 especially
> if there are millions of records involved. However, the suggestion to use
 the
> cume_dist function does not always give the expected answer. (Perhaps I
 just
> don't understand how it is being used.)
>
> Suppose I have a table, called testmedian with one column, A, and it
 contains
> two rows with 1 and 10 as values for A. The median of 1 and 10 is their
 average
> or 5.5. (This is what Excel says it is at any rate.)
>
> However, when I do the cume_dist over A, I get the following:
>
> SQL> select a, cume_dist() over (order by a) result from testmedian;
>
> A RESULT
> ---------- ----------
> 1 .5
> 10 1
>
> This would imply that 1 is the median, which is incorrect.
>
> It is also possible that cume_dist does not return any value at the .5
 level.
> This would occur, for example, if the numbers in the table were 1,1,10.
>
> SQL> select a, cume_dist() over (order by a) result from testmedian;
>
> A RESULT
> ---------- ----------
> 1 .666666667
> 1 .666666667
> 10 1
>
> One of the earlier respondants suggested counting the number of rows and
> selecting the median by fetching the appropriate number of ordered rows.
 If the
> number of rows is even, the median would be the average value of the field
 of
> interest in the two central rows. Otherwise it is the value of the field
 of
> interest in the central row.
>
> This appears to be the most straightforward way to get the median. It is
> probably also faster than using Excel.
>
> Frank Hubeny
>
> Dave Fowler wrote:
>
> > Herwig,
> > Try this. Build pl/sql function passing value for col_distvalue around
> > this cursor....This function could be as dynamic as you want. In the
 loop
> > looking at rows Get the Col1 row with column cume_dist value of .5 it
 works
> > for even/odd number of total rows. Return the col1 value.
> >
> > Test data Col_distvalue is varchar2 of the same value e.g. 'X' col1 is
> > number
> >
> > SELECT col_distvalue,COL1,count(*) count
> > , CUME_DIST() OVER (PARTITION BY col_distvalue ORDER BY col1) AS
 cume_dist
> > FROM mediantest
> > group by col_distvalue,col1
> >
> > I was suprised a median function hadn't been written yet either with all
 the
> > advanced stats for OLAP services in Oracle 8.
> >
> > This gets it done competely in Oracle enviroment without EXCEL or other
> > products involved and returns it with a select statement.
> >
> > hth,
> > Dave Fowler
> >
> > "Herwig Henseler" <hh_at_ecce-terram.de> wrote in message
> > news:hh-2D0C99.18481320042001_at_news.f.via-net-works.de...
> > > Hi experts,
> > >
> > > [Oracle 8i 8.1.7]
> > >
> > > How do I compute the Median of values in a table? I guess this must be
 a
> > > FAQ but haven't found an answer so far. The median is the value in the
> > > "middle" of a sorted list of values. That is, half of all values are
> > > bigger and the other half is smaller than the median.
> > >
> > > TIA,
> > > Herwig
> > >
> > > --
> > > ECCE TERRAM Internet Services GmbH Dr. Herwig Henseler
> > > E-Business & Webagentur Tel. 0441 / 500 12-0
> > > Heinrichstr. 18d Fax. 0441 / 500 12-29
> > > 26131 Oldenburg URL: http://www.ecce-terram.de
> > > "Since I've solved all our Y2K problems, I will be on vacation from
> > > 28th Dec 1999 to 6th Jan 1900."
> > >
>
>
Received on Sat Apr 28 2001 - 08:06:51 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US