Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Calculating The Median: Error Discovered in Oracle SQL 101 Code

RE: Calculating The Median: Error Discovered in Oracle SQL 101 Code

From: Larry Elkins <elkinsl_at_flash.net>
Date: Mon, 02 Jul 2001 06:36:40 -0700
Message-ID: <F001.0033F1C3.20010702064649@fatcity.com>

Ian,

Nice stuff!

Since this subject was brought back up, I thought maybe some would be interested in the following. I've never had a need to calculate a median, but, I knew Celko's SQL for Smarties had a few variations and examples from various people, each with caveats. And then there were differences between what he termed statistical and financial mean, and some other things as well. Anyway, a google search turned up another Celko solution. And this one also brings up the concept of weighted median.

Here is his example,
http://www.intelligententerprise.com/db_area/archives/1999/992004/celko.shtm l, modified by me to use the standard EMP table's SAL column:

SQL> SELECT AVG(DISTINCT x.sal)
  2 FROM (SELECT F1.sal

  3            FROM emp F1, emp F2
  4           GROUP BY F1.empno, F1.sal
  5          HAVING SUM(CASE WHEN F2.sal = F1.sal
  6                           THEN 1 ELSE 0 END)
  7          >= ABS(SUM(CASE WHEN F2.sal < F1.sal THEN 1
  8                          WHEN F2.sal > F1.sal THEN -1
  9                           ELSE 0 END)))
 10          X

 11 /

AVG(DISTINCTX.SAL)


              1550

The link above goes into some detail regarding the logic behind the query and how his query finally reached the form above. I may never need to do a median, but, this subject has been a good opportunity for learning. I've tested the above with even, odd, multiple occurences of SAL, null,s etc. It seems to work, but, everyone have a whack at it if you like.

Regards,

Larry G. Elkins
elkinsl_at_flash.net

> -----Original Message-----
> From: root_at_fatcity.com [mailto:root_at_fatcity.com]On Behalf Of MacGregor,
> Ian A.
> Sent: Saturday, June 30, 2001 9:00 PM
> To: Multiple recipients of list ORACLE-L
> Subject: Calculating The Median: Error Discovered in Oracle SQL 101 Code
>
>
> I ran the code from Oracle SQL 101 which Jared posted modifying
> it to find the median of the sal column on that table
>
> SQL> select
> 2 rownum,
> 3 sal
> 4 from (
> 5 select sal
> 6 from scott.emp
> 7 where sal is not null
> 8 union
> 9 select 1 from dual where 1=2
> 10 )
> 11 group by sal, rownum
> 12 having rownum >= (
> 13 select decode( mod(total_freq,2),
> 14 1,trunc(total_freq/2 + 1),
> 15 0,trunc(total_freq/2)
> 16 )
> 17 from (
> 18 select count(*) total_freq
> 19 from scott.emp
> 20 where sal is not null
> 21 )
> 22 )
> 23 and rownum <= (
> 24 select decode( mod(total_freq,2),
> 25 1,trunc(total_freq/2 + 1),
> 26 0,trunc(total_freq/2 + 1)
> 27 )
> 28 from (
> 29 select count(*) total_freq
> 30 from scott.emp
> 31 where sal is not null
> 32 )
> 33 )
> 34 /
>
> values
> averaged
> ROWNUM in median
> ---------- ----------
> 7 1600
> 8 2450
> ----------
> Median 2025
>
> ------------------------------------------------------------------
> --------------------
> This answer is different from the result of the code I posted
> which uses the new analytical functions.
>
> select
> case
> when mod(number_salaried,2) = 0 then
> (select sum(sal)/2 from(select sal, row_number()
> over ( order by sal) as salrank
> from scott.emp)
> where salrank = number_salaried/2
> or salrank = number_salaried/2 +1)
> else
> (select sal from(select sal, row_number()
> over ( order by sal) as salrank
> from scott.emp)
> where salrank = ceil(number_salaried/2))
> end median
> from (select sal,rank() over (order by sal) as rk from scott.emp),
> (select count(sal) number_salaried from scott.emp)
> where rk = 1
> /
>
> MEDIAN
> ---------
> 1550
> ------------------------------------------------------------------
> ------------------------------
> Emp is a 14 row table . The median should be the average of the
> seventh and eighth values.
> I cleared the computes and columns and ran the first part of the
> SQL 101 code
>
> clear computes
> utes cleared
> select
> rownum,
> sal
> from (
> select sal
> from scott.emp
> where sal is not null
> union
> select 1 from dual where 1=2
> )
> group by sal, rownum
> /
>
> ROWNUM SAL
> ------ ----------
> 1 800
> 2 950
> 3 1100
> 4 1250
> 5 1300
> 6 1500
> 7 1600
> 8 2450
> 9 2850
> 10 2975
> 11 3000
>
> ROWNUM SAL
> ------ ----------
> 12 5000
>
> and also ran the part of my code which corresponded. I changed
> my code slightly so the salrank column would print.
>
> SQL> select salrank, sal from(select sal, row_number()
> 2 over ( order by sal) as salrank
> 3 from scott.emp)
> 4 /
>
> SALRANK SAL
> ---------- ----------
> 1 800
> 2 950
> 3 1100
> 4 1250
> 5 1250
> 6 1300
> 7 1500
> 8 1600
> 9 2450
> 10 2850
> 11 2975
>
> SALRANK SAL
> ---------- ----------
> 12 3000
> 13 3000
> 14 5000
>
> 14 rows selected.
>
> ------------------------------------------------------------------
> ----------------------------
> The reason for the different answers is now apparent. The SQL
> 101 code is tossing duplicate
> records. It's been a long time since my stats classes, but I'm
> about 99.999999% confident you don't purge duplicates when
> computing a median. But even if I'm wrong about this, the SQL
> 101 code has reduced the set to 12 members, but it is still
> computing the median as if there were 14 members;
> that is, it is taking the average of the 7th and 8th values and
> not the average of the 6th and 7th.
>
> I hope there was a caveat in SQL 101 book stating the code only
> worked against columns with unique values, not including nulls.
>
> Ian MacGregor
> Stanford Linear Accelerator Center
> ian_at_slac.stanford.edu
>
>
>
>
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> --
> Author: MacGregor, Ian A.
> INET: ian_at_SLAC.Stanford.EDU
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Larry Elkins
  INET: elkinsl_at_flash.net

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Mon Jul 02 2001 - 08:36:40 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US