Re: Computing Averages in MS Excel

From: dawn <dawnwolthuis_at_gmail.com>
Date: 8 Dec 2005 08:34:49 -0800
Message-ID: <1134059689.752533.87750_at_f14g2000cwb.googlegroups.com>


David Cressey wrote:
> A little while ago, dawn mentioned in the monster NULL thread that she was
> busy trying to
> reduce NULLS from SQL into an SQL-wrapper.

IIRC It was the smaller 2VL 3VL thread -- I don't recall contributing to the larger one.

>
> After she said that she couldn't simply filter out the missing data, the
> subject dropped.
> I'll note in passing that 2vl do, eventually, run into the necessity for
> dealing with missing data. I suspect that Pick programmers do, too.

Yes, of course. I believe I indicated that an attribute with no value can be seen as having an empty set of values assigned to it. That way the data attribute has a value (the null set) that indicates the absense of a value. An issue that arises here is that it is a good idea to have expressed values for the case when you know there is no value for an attribute (e.g. NA for not available) compared to when you do not know if there is one or not but do not have one.

>
> Anyway, that led me to reframe the question, as follows:
>
> How to present missing data to users in MS Excel, and do it in such a way
> that it won't skew averages?
>
> So I cranked up a blank worksheet, and I started out by defining a
> rectangular area on it.
> I set up four cells that would take functions on that area: AVERAGE,
> COUNT, SUM, and SUM divided by COUNT.
>
> I then proceeded to enter various things in the cells, to see what I would
> get. I was pleased with the way COUNT works:
>
> COUNT includes cells that contain numbers, and also cells that contain
> formulas that evaluate to numbers.
> COUNT excludes empty cells, labels (cells that contain text), and cells
> that contain formulas that evaluate to text strings.
>
> I didn't mess around with dates.
>
> AVERAGE and SUM use the same criterion that COUNT does. And SUM divided by
> COUNT always gave the same answer as AVERAGE.
>
> So here's my answer (before looking at any documentation): In MS Excel, I
> would use an empty cell to represent the SQL NULL. If my users asked what
> an empty cell means, I'd say, "it means there isn't anything in it."
> I would expect 99% of users to go on to the next issue without missing a
> beat.

If I'm understanding, that is the way that Pick handles it. That is why end-users never give NULL a thought there. It averages only what is there. The problem arises if you add SQL to the mix and say that you have a numeric attribute. Notice that Excel doesn't have strong typing. You can likely execute a SQL query against Excel using some tool or other. I wonder how it handles the column if you request an average of all numbers from SQL and there is data missing in some cells. I'm guessing it would give an error. That is the problem I had with a SQL wrapper on a 2VL system. If a column was defined as numeric in any way, it seemed to want numbers in it. Funny thing, eh?

Cheers! --dawn Received on Thu Dec 08 2005 - 17:34:49 CET

Original text of this message