Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Calculating The Median: Error Discovered in Oracle SQL 101 Code

Calculating The Median: Error Discovered in Oracle SQL 101 Code

From: MacGregor, Ian A. <ian_at_SLAC.Stanford.EDU>
Date: Sat, 30 Jun 2001 18:04:51 -0700
Message-ID: <F001.0033E8B5.20010630180020@fatcity.com>

I ran the code from Oracle SQL 101 which Jared posted modifying it to find the median of the sal column on that table

SQL> select
  2 rownum,
  3 sal
  4 from (
  5 select sal
  6 from scott.emp
  7 where sal is not null
  8 union
  9 select 1 from dual where 1=2
 10 )
 11 group by sal, rownum
 12 having rownum >= (
 13 select decode( mod(total_freq,2),
 14 1,trunc(total_freq/2 + 1),
 15 0,trunc(total_freq/2)
 16 )
 17 from (
 18 select count(*) total_freq
 19 from scott.emp
 20 where sal is not null
 21 )
 22 )
 23 and rownum <= (
 24 select decode( mod(total_freq,2),
 25 1,trunc(total_freq/2 + 1),
 26 0,trunc(total_freq/2 + 1)
 27 )
 28 from (
 29 select count(*) total_freq
 30 from scott.emp
 31 where sal is not null
 32 )
 33 )
 34 /

               values
             averaged

    ROWNUM in median
---------- ----------
         7       1600
         8       2450
           ----------
Median           2025


--------------------------------------------------------------------------------------
This answer is different from the result of the code I posted which uses the new analytical functions.

 select
 case

     when mod(number_salaried,2) = 0 then
        (select sum(sal)/2 from(select sal, row_number()
        over ( order by sal) as salrank
        from scott.emp)
        where salrank  = number_salaried/2
        or salrank = number_salaried/2 +1)
     else
        (select sal from(select sal, row_number()
        over ( order by sal) as salrank
        from scott.emp)
        where salrank = ceil(number_salaried/2))
 end median
 from (select sal,rank() over (order by sal) as rk from scott.emp),  (select count(sal) number_salaried from scott.emp)  where rk = 1
/

   MEDIAN


     1550



Emp is a 14 row table . The median should be the average of the seventh and eighth values.
I cleared the computes and columns and ran the first part of the SQL 101 code

 clear computes
utes cleared
 select
  rownum,
  sal
 from (
  select sal
  from scott.emp
  where sal is not null
  union
  select 1 from dual where 1=2
 )
 group by sal, rownum
 /

ROWNUM SAL
------ ----------

     1        800
     2        950
     3       1100
     4       1250
     5       1300
     6       1500
     7       1600
     8       2450
     9       2850
    10       2975
    11       3000

ROWNUM        SAL
------ ----------
    12       5000

and also ran the part of my code which corresponded. I changed my code slightly so the salrank column would print.

SQL> select salrank, sal from(select sal, row_number()

  2          over ( order by sal) as salrank
  3          from scott.emp)

  4 /

   SALRANK SAL
---------- ----------

         1        800
         2        950
         3       1100
         4       1250
         5       1250
         6       1300
         7       1500
         8       1600
         9       2450
        10       2850
        11       2975

   SALRANK        SAL

---------- ----------
12 3000 13 3000 14 5000

14 rows selected.



The reason for the different answers is now apparent. The SQL 101 code is tossing duplicate
records. It's been a long time since my stats classes, but I'm about 99.999999% confident you don't purge duplicates when computing a median. But even if I'm wrong about this, the SQL 101 code has reduced the set to 12 members, but it is still computing the median as if there were 14 members; that is, it is taking the average of the 7th and 8th values and not the average of the 6th and 7th.

I hope there was a caveat in SQL 101 book stating the code only worked against columns with unique values, not including nulls.

Ian MacGregor
Stanford Linear Accelerator Center
ian_at_slac.stanford.edu

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: MacGregor, Ian A.
  INET: ian_at_SLAC.Stanford.EDU

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists

--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Received on Sat Jun 30 2001 - 20:04:51 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US