| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: Pearson-r in SQL
>> I am not familiar with the Pearson-r function, so, FWIW, I believe
the
answer lies in the answer to the question, "What does null mean in your
data?" <<
This is the standard descriptive statistic for how close to linear two variables are. What I want to discuss is how to put NULLs into this aggregate function which has two parameters instead of one.
I can argue for replacing (NULL, NULL) with (x,x) for some value to say that the NULLs are alike, in a "GROUP BY" sort of way, but I am not sure what the correct practice for the (x, NULL) and (y, NULL) pairs are.
>> you could use NVL(x,0) in place of x (this would cause any null to
be represented by a zero), and likewise for y. <<
Actually, I could not do that -- I write Standard SQL and not unclean, evil dialect (ugh!) :)
>> If "null" connotes, "I don't know so I can't say for sure", then I
would drop the pair, but report on existence of these pairs separately.
<<
There is a little-known or used feature in Standard SQL that says you get a SQLSTATE warning code to show that an aggregate dropped NULLs. My aggregates will give me that much.
But from a statistical view point, should missing values be repalced with some computed value like the AVG() or what? I have been away from it so long, I honestly do not know and back when I did stats for a living there was no SQL or NULLs (yes, I am old).
>> BTW, your table does not allow nulls. <<
That's why I started asking ...
Received on Wed Dec 22 2004 - 18:40:50 CST
![]() |
![]() |