Re: Pearson-r in SQL

From: -CELKO- <jcelko212_at_earthlink.net>
Date: 22 Dec 2004 16:40:50 -0800
Message-ID: <1103762450.272177.44700_at_z14g2000cwz.googlegroups.com>


>> I am not familiar with the Pearson-r function, so, FWIW, I believe
the
answer lies in the answer to the question, "What does null mean in your data?" <<

This is the standard descriptive statistic for how close to linear two variables are. What I want to discuss is how to put NULLs into this aggregate function which has two parameters instead of one.

I can argue for replacing (NULL, NULL) with (x,x) for some value to say that the NULLs are alike, in a "GROUP BY" sort of way, but I am not sure what the correct practice for the (x, NULL) and (y, NULL) pairs are.

>> you could use NVL(x,0) in place of x (this would cause any null to
be represented by a zero), and likewise for y. <<

Actually, I could not do that -- I write Standard SQL and not unclean, evil dialect (ugh!) :)

>> If "null" connotes, "I don't know so I can't say for sure", then I
would drop the pair, but report on existence of these pairs separately. <<

There is a little-known or used feature in Standard SQL that says you get a SQLSTATE warning code to show that an aggregate dropped NULLs. My aggregates will give me that much.

But from a statistical view point, should missing values be repalced with some computed value like the AVG() or what? I have been away from it so long, I honestly do not know and back when I did stats for a living there was no SQL or NULLs (yes, I am old).

>> BTW, your table does not allow nulls. <<
That's why I started asking ... Received on Thu Dec 23 2004 - 01:40:50 CET

Original text of this message