Re: All hail Neo!

From: Frank Hamersley <terabitemightbe_at_bigpond.com>
Date: Thu, 27 Apr 2006 03:38:16 GMT
Message-ID: <I0X3g.17125$vy1.12367_at_news-server.bigpond.net.au>


Marshall Spight wrote:
> Frank Hamersley wrote:

>> Marshall Spight wrote:
>>> The idea of null as something that taints everything it
>>> touches doesn't seem useful or practical to me.
>> I fully understand the sentiment however in cases like this I prefer
>> arrangements that retain flexibility for the (awake) programmer and
>> provide a form of simple clarity.

>
> I agree, however I think what I described better meets these
> criteria than what you described! As far as flexibility goes,
> you can always put a CASE statement around the value
> if you need to.

Good point - however the use of the aggregate functions in group by or having clauses would become quite unwieldy if you had to clone your case statement several times in the query. Better to express it once in the where clause, no?

>>  i.e. if there is one null the avg() is null.

>
> I can't ever imagine a situation where this would be useful
> or desirable. Anyone?

I can only think of the case where you do not want to emit an avg() if any of the tuples hold null - sort of an integrity issue. Of course a rigid constraint like this can be managed in many other ways with DDL and I would always insist that approach be the 80% of a solution. However there is always the practical possibility that those constraints might fail on occasion and at that point it would seem an "unmanaged" avg() producing a null result might be useful to signal an error.

> Let's say I had a customer database with 1000 customers in
> it, and I have the age of 999 of them, and one null. Let's say
> I hire an analyst because I want to study some things about
> my customer database. Maybe I want to do some TV advertising,
> so I want to know some things about the demographics of my
> customers. I ask the analyst, what is the average age of my
> customer base, and what does each decile look like? The
> analyst comes back later, and says, the average age of
> our customer base is unknown. For each decile, there are
> 100 people, and the average of each decile is unknown,
> and the range of ages in each decile is unknown. I'm going
> to fire that analyst and get a good one.

Yep - especially if he is writing SQL queries without knowing how they function. Of course this is a trivial issue that industry types would be able to adapt to quite easily. What I am more interested in is the uglier ones or analysing the risk to the existing code that is out there.

>> This simplistic approach means any stray nulls creeping into a dataset
>> where none are expected will not go undetected if inadequately
>> constrained queries are framed.

>
> I don't see this as an issue. The schema controls whether nulls
> are allowed or not; if they are not allowed, they won't be there,
> and if they are allowed, they are pretty much certain to be there.

I always allow for the best intentions of mice and men to reduce that from certainty (aka 1.00).

> And again, if for some strange reason you want to query for
> that, you can.

True - either the current or proposed behaviour is manageable if you know which is in place. To me the latter has a degree more purity and so is desirable. But adopting it would have radical effects on many other null treatments - some of which may not be so palatable.

Cheers, Frank. Received on Thu Apr 27 2006 - 05:38:16 CEST

Original text of this message