Re: You can't do this with MySQL!
Date: Sun, 13 Mar 2016 15:48:12 +0100
Message-ID: <9922013.5pPKr8dvML_at_PointedEars.de>
Stefan Ram wrote:
> In my SQL course I am asked relatively often about how to do > things that cannot be done with MySQL easily: > > - calculate the median(s) or quartiles of a sample
Using the definition in <https://en.wikipedia.org/wiki/Median>,
| In statistics and probability theory, a *median* is the number separating
| the higher half of a data sample, a population, or a probability
| distribution, from the lower half. The median of a finite list of numbers
| can be found by arranging all the observations from lowest value to
| highest value and picking the middle one (e.g., the median of {3, 3, 5, 9,
| 11} is 5). If there is an even number of observations, then there is no
| single middle value; the median is then usually defined to be the mean of
| the two middle values[1][2] (the median of {3, 5, 7, 9} is (5 + 7) / 2 =
| 6), which corresponds to interpreting the median as the fully trimmed
| mid-range.
calculating the median in MySQL looks rather easy to me, although it is arguably not trivial.
Suppose I have a non-empty table `tmp` in the currently selected database that has a column `year` of type INT of which I want to compute the median. Then:
- Determine the size of the sample
SET _at_count := (SELECT COUNT(*) FROM `tmp`);
2. Determine the index of the middle of the sample
(LIMIT indexes start at 0, see below)
SET _at_middle_index := (SELECT CEILING(@count / 2) - 1);
3. Determine the median
/* * Prepared Statement because you cannot use variable values * directly with LIMIT; `tmp2` because “every derived table * needs an alias” */
PREPARE `stmt`
FROM 'SET _at_avg := (
SELECT AVG(`year`) FROM (SELECT `year` FROM `tmp` ORDER BY `year` LIMIT ?, 2) AS `tmp2`)';
EXECUTE `stmt` USING _at_middle_index;
PREPARE `stmt`
FROM 'SET _at_middle := (
SELECT `year` FROM `tmp` ORDER BY `year` LIMIT ?, 1
)';
EXECUTE `stmt` USING _at_middle_index;
DEALLOCATE PREPARE `stmt`;
SET _at_median := (SELECT IF(@count % 2 = 0, @avg, @middle));
This may be refactored into a stored function called “MEDIAN”, and reused henceforth. I would be surprised if someone had not done it already.
> I take it for granted that declarative SQL is not well- > adapted to do such things.
That is your main problem: You take many things for granted that are demonstrably not so. You need to “Learn clear thought: learn to know what is from what seems to be, and what you wish to be.” (Surak)
-- PointedEars Twitter: _at_PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail.Received on Sun Mar 13 2016 - 15:48:12 CET