Re: Pearson-r in SQL

From: Alan <not.me_at_rcn.com>
Date: Wed, 22 Dec 2004 20:36:29 -0500
Message-ID: <32ulp6F3q3bpfU1_at_individual.net>


"-CELKO-" <jcelko212_at_earthlink.net> wrote in message news:1103762450.272177.44700_at_z14g2000cwz.googlegroups.com...
> >> I am not familiar with the Pearson-r function, so, FWIW, I believe
> the
> answer lies in the answer to the question, "What does null mean in your
> data?" <<
>
> This is the standard descriptive statistic for how close to linear two
> variables are. What I want to discuss is how to put NULLs into this
> aggregate function which has two parameters instead of one.

Methinks a visit to alt.sci.math.statistics may be in order.
>
> I can argue for replacing (NULL, NULL) with (x,x) for some value to say
> that the NULLs are alike, in a "GROUP BY" sort of way, but I am not
> sure what the correct practice for the (x, NULL) and (y, NULL) pairs
> are.
>
> >> you could use NVL(x,0) in place of x (this would cause any null to
> be represented by a zero), and likewise for y. <<
>
> Actually, I could not do that -- I write Standard SQL and not unclean,
> evil dialect (ugh!) :)

Sorry. I had a hunch that was an Oracle-specific function, but I wasn't sure. Well, you could build your own with standard SQL...

>
> >> If "null" connotes, "I don't know so I can't say for sure", then I
> would drop the pair, but report on existence of these pairs separately.
> <<
>
> There is a little-known or used feature in Standard SQL that says you
> get a SQLSTATE warning code to show that an aggregate dropped NULLs.
> My aggregates will give me that much.
>
> But from a statistical view point, should missing values be repalced
> with some computed value like the AVG() or what? I have been away from
> it so long, I honestly do not know and back when I did stats for a
> living there was no SQL or NULLs (yes, I am old).

My gut feeling is that if you correlate a value with a null (unknown) the result is unknown, so the comparison is meaningless, and therefore should be dropped. At least that's what I would do without knowing any better. Akin to dividing by zero. You just can't do it. This is, of course, one person's opinion.

>
> >> BTW, your table does not allow nulls. <<
> That's why I started asking ...
>
Received on Thu Dec 23 2004 - 02:36:29 CET

Original text of this message