Re: You can't do this with MySQL!

From: Thomas 'PointedEars' Lahn <PointedEars_at_web.de>
Date: Sun, 13 Mar 2016 15:48:12 +0100
Message-ID: <9922013.5pPKr8dvML_at_PointedEars.de>


Stefan Ram wrote:

>   In my SQL course I am asked relatively often about how to do
>   things that cannot be done with MySQL easily:
> 
>     - calculate the median(s) or quartiles of a sample

Using the definition in <https://en.wikipedia.org/wiki/Median>,

| In statistics and probability theory, a *median* is the number separating
| the higher half of a data sample, a population, or a probability
| distribution, from the lower half. The median of a finite list of numbers
| can be found by arranging all the observations from lowest value to
| highest value and picking the middle one (e.g., the median of {3, 3, 5, 9,
| 11} is 5). If there is an even number of observations, then there is no
| single middle value; the median is then usually defined to be the mean of
| the two middle values[1][2] (the median of {3, 5, 7, 9} is (5 + 7) / 2 =
| 6), which corresponds to interpreting the median as the fully trimmed
| mid-range.

calculating the median in MySQL looks rather easy to me, although it is arguably not trivial.

Suppose I have a non-empty table `tmp` in the currently selected database that has a column `year` of type INT of which I want to compute the median. Then:

  1. Determine the size of the sample

   SET _at_count := (SELECT COUNT(*) FROM `tmp`);

2. Determine the index of the middle of the sample

   (LIMIT indexes start at 0, see below)

   SET _at_middle_index := (SELECT CEILING(@count / 2) - 1);

3. Determine the median

   /*
    * Prepared Statement because you cannot use variable values
    * directly with LIMIT; `tmp2` because “every derived table
    * needs an alias”
    */

   PREPARE `stmt`
   FROM 'SET _at_avg := (
     SELECT AVG(`year`)
     FROM (SELECT `year` FROM `tmp` ORDER BY `year` LIMIT ?, 2) AS `tmp2`
   )';

   EXECUTE `stmt` USING _at_middle_index;

   PREPARE `stmt`
   FROM 'SET _at_middle := (
     SELECT `year` FROM `tmp` ORDER BY `year` LIMIT ?, 1    )';

   EXECUTE `stmt` USING _at_middle_index;

   DEALLOCATE PREPARE `stmt`;

   SET _at_median := (SELECT IF(@count % 2 = 0, @avg, @middle));

This may be refactored into a stored function called “MEDIAN”, and reused henceforth. I would be surprised if someone had not done it already.

> I take it for granted that declarative SQL is not well- > adapted to do such things.

That is your main problem: You take many things for granted that are demonstrably not so. You need to “Learn clear thought: learn to know what is from what seems to be, and what you wish to be.” (Surak)

-- 
PointedEars

Twitter: _at_PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
Received on Sun Mar 13 2016 - 15:48:12 CET

Original text of this message