Re: Pearson-r in SQL

From: Matthias Klaey <mpky_at_hotmail.com>
Date: Sat, 25 Dec 2004 06:46:50 +0100
Message-ID: <v4vps0drt58kdda4rc04uqvgjgql1fklqa_at_4ax.com>


On 24 Dec 2004 08:22:00 -0800, "-CELKO-" <jcelko212_at_earthlink.net> wrote:

>Thank you *very much* for the references -- I can now do something over
>Christmas instead of watching parades and sappy movies on television :)
>
>I did not use the AVG() because I wanted to see if anyone had an
>arguement for changes to SUM() and/or COUNT(). Yes, the "^2" is not
>Standard SQL, but there are several vendor versions, such as POWER (x,
>2) or SQR(x).
>
>I am an old FORTRAN programmer, so I tend to write (x*x) too much. We
>used to do that to avoid converting integers to float in the early
>days.
>
>>> If you want to keep them, I would first calculate a linear
>regression
>with the remaining pairs, say y = a + b*x, and then fill in the
>expected values, .. <<
>
>I thought about that, but I was worried that this would force the
>missing values toward a correlation, whereas an average would be more
>representative of each set of values without influence from the other
>set. Or even use a Median, as a better measure of central tendency
>within a set.

Here is a pretty good summary and starting point for the discussion of missing values in statistics.

  http://www.herc.research.med.va.gov/FAQ_I9.htm

>But I honestly do not know what the preferred method is.

I think there is no single correct answer; it all depends on the context of your observations. Also you might want to google for the terms "missing", "value", "imputation". You will get lots of hits.

Greetings
Matthias Kläy

-- 
www.kcc.ch
Received on Sat Dec 25 2004 - 06:46:50 CET

Original text of this message