Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: Calculating The Median: Error Discovered in Oracle SQL 101 Co de
I was 99.999999% that one doesn't toss duplicates when computing a median; thanks to the link Jared has posted, I'm now 100% confident. The Oracle SQL 101 code will correctly compute a median only if the values in the column it is applied against are unique, not including nulls. Median calculations are almost always performed against populations which have duplicate values.
It's been stated before, "All books have errors". Oracle SQL 101 should not be condemned to the trash for the error, but the mistake should enforce the rule that code posted here or written in a book by must be thoroughly tested before being employed. This rule especially includes the code I post.
Ian MacGregor
Stanford Linear Accelerator Center
ian_at_slac.stanford.edu
-----Original Message-----
Sent: Sunday, July 01, 2001 12:20 AM
To: Multiple recipients of list ORACLE-L
Code
Try this link:
http://math.about.com/science/math/library/howto/htmean.htm
Jared
On Saturday 30 June 2001 19:00, MacGregor, Ian A. wrote:
> I ran the code from Oracle SQL 101 which Jared posted modifying it to find
> the median of the sal column on that table
>
> SQL> select
> 2 rownum,
> 3 sal
> 4 from (
> 5 select sal
> 6 from scott.emp
> 7 where sal is not null
> 8 union
> 9 select 1 from dual where 1=2
> 10 )
> 11 group by sal, rownum
> 12 having rownum >= (
> 13 select decode( mod(total_freq,2),
> 14 1,trunc(total_freq/2 + 1),
> 15 0,trunc(total_freq/2)
> 16 )
> 17 from (
> 18 select count(*) total_freq
> 19 from scott.emp
> 20 where sal is not null
> 21 )
> 22 )
> 23 and rownum <= (
> 24 select decode( mod(total_freq,2),
> 25 1,trunc(total_freq/2 + 1),
> 26 0,trunc(total_freq/2 + 1)
> 27 )
> 28 from (
> 29 select count(*) total_freq
> 30 from scott.emp
> 31 where sal is not null
> 32 )
> 33 )
> 34 /
>
> values
> averaged
> ROWNUM in median
> ---------- ----------
> 7 1600
> 8 2450
> ----------
> Median 2025
>
> ---------------------------------------------------------------------------
>----------- This answer is different from the result of the code I posted
> which uses the new analytical functions.
>
> select
> case
> when mod(number_salaried,2) = 0 then
> (select sum(sal)/2 from(select sal, row_number()
> over ( order by sal) as salrank
> from scott.emp)
> where salrank = number_salaried/2
> or salrank = number_salaried/2 +1)
> else
> (select sal from(select sal, row_number()
> over ( order by sal) as salrank
> from scott.emp)
> where salrank = ceil(number_salaried/2))
> end median
> from (select sal,rank() over (order by sal) as rk from scott.emp),
> (select count(sal) number_salaried from scott.emp)
> where rk = 1
> /
>
> MEDIAN
> ---------
> 1550
> ---------------------------------------------------------------------------
>--------------------- Emp is a 14 row table . The median should be the
> average of the seventh and eighth values. I cleared the computes and
> columns and ran the first part of the SQL 101 code
>
> clear computes
> utes cleared
> select
> rownum,
> sal
> from (
> select sal
> from scott.emp
> where sal is not null
> union
> select 1 from dual where 1=2
> )
> group by sal, rownum
> /
>
> ROWNUM SAL
> ------ ----------
> 1 800
> 2 950
> 3 1100
> 4 1250
> 5 1300
> 6 1500
> 7 1600
> 8 2450
> 9 2850
> 10 2975
> 11 3000
>
> ROWNUM SAL
> ------ ----------
> 12 5000
>
> and also ran the part of my code which corresponded. I changed my code
> slightly so the salrank column would print.
>
> SQL> select salrank, sal from(select sal, row_number()
> 2 over ( order by sal) as salrank
> 3 from scott.emp)
> 4 /
>
> SALRANK SAL
> ---------- ----------
> 1 800
> 2 950
> 3 1100
> 4 1250
> 5 1250
> 6 1300
> 7 1500
> 8 1600
> 9 2450
> 10 2850
> 11 2975
>
> SALRANK SAL
> ---------- ----------
> 12 3000
> 13 3000
> 14 5000
>
> 14 rows selected.
>
> ---------------------------------------------------------------------------
>------------------- The reason for the different answers is now apparent.
> The SQL 101 code is tossing duplicate records. It's been a long time
> since my stats classes, but I'm about 99.999999% confident you don't purge
> duplicates when computing a median. But even if I'm wrong about this, the
> SQL 101 code has reduced the set to 12 members, but it is still computing
> the median as if there were 14 members; that is, it is taking the average
> of the 7th and 8th values and not the average of the 6th and 7th.
>
> I hope there was a caveat in SQL 101 book stating the code only worked
> against columns with unique values, not including nulls.
>
> Ian MacGregor
> Stanford Linear Accelerator Center
> ian_at_slac.stanford.edu
-- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jared Still INET: jkstill_at_cybcon.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: MacGregor, Ian A. INET: ian_at_SLAC.Stanford.EDU Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).Received on Sun Jul 01 2001 - 10:47:13 CDT