Re: All hail Neo!

From: Bob Badour <>
Date: Wed, 26 Apr 2006 19:35:47 GMT
Message-ID: <nYP3g.66236$>

Marshall Spight wrote:

> Frank Hamersley wrote:

>>Marshall Spight wrote:
>>>The idea of null as something that taints everything it
>>>touches doesn't seem useful or practical to me.
>>I fully understand the sentiment however in cases like this I prefer
>>arrangements that retain flexibility for the (awake) programmer and
>>provide a form of simple clarity.

> I agree, however I think what I described better meets these
> criteria than what you described! As far as flexibility goes,
> you can always put a CASE statement around the value
> if you need to.

Since null reduces expressibility relative to the RM, the premise above is false. Having no ability to express an important concept is neither flexible nor powerful, and null reduces one's ability. Pretending that this has no practical consequence is intellectually dishonest.

I can only surmise Frank uses some special sort of Orwellian logic. Not Orwellian in the sense that Orwell would ever use it himself, but in the sense of the so-called logic Orwell described so well. Frank's reasoning is fallacious as already established here.

>> i.e. if there is one null the avg() is null.

> I can't ever imagine a situation where this would be useful
> or desirable. Anyone?

I will be kind and assume your lack of imagination results from imbibing too much of the elixir itself.

Since when are correct answers neither useful nor desirable? Do you prefer your employees give you incorrect answers to avoid admitting when they do not know an answer?

> Let's say I had a customer database with 1000 customers in
> it, and I have the age of 999 of them, and one null.

Let's suppose instead you are a direct marketer whose database records facts on 25 million customers. Let's further suppose your demographic information is incomplete based as it is on self-reported questionnaires.

Let's suppose the women leave the sex field blank more often than men due to feelings of vulnerability (and women living alone more often than women in general.) Let's also suppose that women between the ages of 35 and 55 omit their birthdates more often than any other cohort due to reasons of vanity.

While you may not know it, half of the people who left both the sex and birthdate blank on the questionnaire are women between the ages of 35 and 55 living alone.

You send out a test mailing for a product that makes it very easy to fix or replace leaky faucets. You get an unheard of 10% response rate among women between the ages of 35 and 55 living alone which is a 5% response rate among those who left both sex and birthdate blank and a 1.5% response rate overall.

Your regression analysis indicates age and sex correlate highly with purchasers, and combined they correlate even more.

You say: "Great! What are the min, max and average ages of purchasers? What percentage are men vs. women?"

select min(today-birthdate), max(today-birthdate), avg(today-birthdate) , sum(case when sex='male' then 1 else 0 end) * 100 / count(*) , sum(case when sex='female' then 1 else 0 end) * 100 / count(*) from campaign_responders;

Your database tells you your max age is 42, the min age is 22, the average age is 27, men were 80% of your responders and women were 20%.

You issue a few more queries to refine things and you decide to send out a massive mailing to 50 million men aged 24 to 35 from every list you can get your hands on. You spend millions and lose your shirt.

Your competitor uses a product that gives a correct answer, sends out a massive mailing to 25 million people who left both fields blank from every list he can get his hands on. He sells over a million units for a $10 million dollar profit while expanding his database by a million names.

  Let's say
> I hire an analyst because I want to study some things about
> my customer database. Maybe I want to do some TV advertising,
> so I want to know some things about the demographics of my
> customers. I ask the analyst, what is the average age of my
> customer base, and what does each decile look like? The
> analyst comes back later, and says, the average age of
> our customer base is unknown. For each decile, there are
> 100 people, and the average of each decile is unknown,
> and the range of ages in each decile is unknown. I'm going
> to fire that analyst and get a good one.

As would I. He should say: "How do you want me to handle the unknown ages?" Or: "Only 1 age in your database was unknown. That will not have a significant impact on the following deciles, which I have rounded to the nearest year in any case:..."

>>This simplistic approach means any stray nulls creeping into a dataset
>>where none are expected will not go undetected if inadequately
>>constrained queries are framed.

> I don't see this as an issue. The schema controls whether nulls
> are allowed or not; if they are not allowed, they won't be there,
> and if they are allowed, they are pretty much certain to be there.
> And again, if for some strange reason you want to query for
> that, you can.

But how do you ask it to deliver the correct average? Or the correct minimum? Maximum? Sum? Null simply prevents one from asking the question. Hence the illusion of power. Received on Wed Apr 26 2006 - 21:35:47 CEST

Original text of this message