Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Calculate median with ORACLE??

Re: Calculate median with ORACLE??

From: <Solomon.Yakobson_at_entex.com>
Date: 1997/02/19
Message-ID: <856385913.31670@dejanews.com>#1/1

First lets agree on a definition. In order to calculate median you need to:

  1. exclude all the rows where column value is null;
  2. arrange column values in ascending order;
  3. if number of rows where column value is not null is odd, median is column value in the middle row;
  4. if number of rows where column value is not null is even, median is average of column values in two middle rows;

The main issue here is to force ORACLE select rows ordered by column value. To do that you need to create an index on that column. To force ORACLE to use it you need to add a where clause condition column >= MIN(column) and (optionally) add an index hint to your select statement. Now you need to know upfront number of not null (for the column) rows, is it odd or even and column MIN value. To do that you can use subquery in from clause (I assume your ORACLE version is 7.2 or higher. If not, you can easily move that subquery to where clause). Then using decode you can replace not needed column values with 0, sum everything up and divide by two. Note that sum is always divided by 2, therefore if count is odd query doubles the middle row value.
Below is an example:

SQL> desc xyz

 Name                            Null?    Type
 ------------------------------- -------- ----
 X                                        NUMBER

SQL> select * from xyz;

         X


        10
         5
         1
         4
         3

Notice that column X values physically are stored not ordered!!!

SQL> create index xyz_i on xyz(x);

Index created.

Index is a must!!!

SQL> set autotrace on
SQL> set echo on
SQL> @median
 SELECT  /*+ INDEX(XYZ_I,XYZ) */
         SUM(DECODE(ROWNUM,Half_Count,DECODE(Remainder,0,X,0),
                           Half_Count+1,DECODE(Remainder,0,X,2*X),
                                        0
                   )
             ) / 2 Median
   FROM  XYZ,
         (SELECT  TRUNC(COUNT(*) / 2) Half_Count,
                  MOD(COUNT(*),2) Remainder,
                  MIN(X) Min_Value
            FROM  XYZ
            WHERE X IS NOT NULL
         )

   WHERE X >= Min_Value
 /

    Median


         4

Execution Plan ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=101 Card=10000 Bytes=1170000)

   1 0 SORT (AGGREGATE)

   2    1     COUNT
   3    2       NESTED LOOPS (Cost=101 Card=10000 Bytes=1170000)
   4    3         VIEW (Cost=1 Card=100 Bytes=6500)
   5    4           SORT (AGGREGATE)
   6    5             TABLE ACCESS (FULL) OF 'XYZ' (Cost=1 Card=100 By
          tes=2600)

   7    3         INDEX (RANGE SCAN) OF 'XYZ_I' (NON-UNIQUE)

This is the most important step. ORACLE is being forced to use XYZ_I index. If you will remove WHERE X >= Min_Value condition ORACLE will do full scan of XYZ table in the physical order and result will be wrong!!!

SQL> @wrong_median
 SELECT /*+ INDEX(XYZ_I,XYZ) */

         SUM(DECODE(ROWNUM,Half_Count,DECODE(Remainder,0,X,0),
                           Half_Count+1,DECODE(Remainder,0,X,2*X),
                                        0
                   )
             ) / 2 Median
   FROM  XYZ,
         (SELECT  TRUNC(COUNT(*) / 2) Half_Count,
                  MOD(COUNT(*),2) Remainder,
                  MIN(X) Min_Value
            FROM  XYZ
            WHERE X IS NOT NULL
         )

   WHERE X IS NOT NULL
 /

    Median


         1

Execution Plan ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2001 Card=200000 Bytes=23400000)

   1 0 SORT (AGGREGATE)

   2    1     COUNT
   3    2       NESTED LOOPS (Cost=2001 Card=200000 Bytes=23400000)
   4    3         TABLE ACCESS (FULL) OF 'XYZ' (Cost=1 Card=2000 Bytes
          =104000)

   5    3         VIEW
   6    5           SORT (AGGREGATE)
   7    6             TABLE ACCESS (FULL) OF 'XYZ' (Cost=1 Card=2000 B
          ytes=26000)

As you can see forcing ORACLE to use index is a must.

Solomon.Yakobson_at_entex.com

In article <5ef09r$qdu_at_nr1.ottawa.istar.net>,   "Sylvain Grenier" <grenie02_at_cnmtl2.cn.ca> wrote:

>
> Hi,
>
> I recently had a request to calculate some statistics on a Oracle table
> values.  I have to calculate minimum, maximum, average and median.  The
> three first are not to complicated but a have no idea how to calculate the
> median.  Is there someone that ever had that problem before?  If yes, is
> there a way?
>
> Thanks in advance.
>
> Sylvain.  (grenie02_at_cnmtl2.cn.ca)


-------------------==== Posted via Deja News ====-----------------------
http://www.dejanews.com/ Search, Read, Post to Usenet
Received on Wed Feb 19 1997 - 00:00:00 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US