Re: All hail Neo!
Date: Wed, 26 Apr 2006 15:06:28 GMT
> Frank Hamersley wrote: >
>>Marshall Spight wrote:
>>>Frank Hamersley wrote:
>>>>I can cope with that - I would have no problem being forced to write
>>>>"select avg(age) from table where age is not null"
>>>>to get the crappy statistic that it is if the user demanded it.
>>>I dunno. If you have a whole lot of people and most of them
>>>have their ages filled in and a few don't, are you ever going
>>>to want to ask, "what is the average age", since the answer
>>>will always be "unknown." Doesn't seem much use to me.
>>>If you want to know if any of them are unknown, you could
>>>ask that specifically. But if you want to know the average
>>>age, then you want to know the average of the data you have;
>>>you're not asking about the data you don't have because
>>>you don't have it. The only useful query in there is "give
>>>me the average age for the data I have"; why should we
>>>make the way you ask for that longer winded than other,
>>>And how "crappy" is that statistic anyway? Probably not
>>>at all crappy. It's probably exactly what you want.
>>>The idea of null as something that taints everything it
>>>touches doesn't seem useful or practical to me.
>>I fully understand the sentiment however in cases like this I prefer
>>arrangements that retain flexibility for the (awake) programmer and
>>provide a form of simple clarity. i.e. if there is one null the avg()
>>is null. It would not take long for the industry to adopt this although
>>with all existing code out there a change might take on Y2K proportions!
>>This simplistic approach means any stray nulls creeping into a dataset
>>where none are expected will not go undetected if inadequately
>>constrained queries are framed. Of course this a very late stage to be
>>worrying about data integrity but better late than never.
>>Of course this is a simple case which is why I am interested to see if
>>Bob comes up with insights into something more difficult.
> > PMFJI, I thought it might be useful to point out that spreadsheets have > had to address this problem since their inception. As far as I know > they, by default, ignore empty cells when averaging ranges, as opposed > to dropping out with an error. As such they view the empty cell as as > not existing in the range at all. If all cells are empty then no range > exists and an error results.
If one is talking about the inception, one would have to test the hypothesis on the first release of visicalc. I do not consider one product repeating the errors of another product valid argument in any case. Received on Wed Apr 26 2006 - 17:06:28 CEST