Re: NULLs

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Mon, 07 Jan 2008 23:50:51 +0100
Message-ID: <qi95o3lc7423v4csnbq150pg41tdjefcen_at_4ax.com>


On Fri, 4 Jan 2008 03:15:05 -0800 (PST), Alfredo Novoa wrote:

>Hi Hugo,

Hi Alfredo,

(snip)
>> IMO, the real problem is not NULL, the real problem is that in reality,
>> there will always be cases where for a single occurance of an entity,
>> one or more attribute values are for whatever reason not on file. Since
>> databases are an instrument to model reality, this concept of "missing
>> information" has to be represented and handled somehow.
>
>In the Relational Model it has to be represented using propositions
>and the Closed World assumption. Nulls and propositions are
>incompatible.

Maybe in the Relational Model, but not in applications that are to be used in businesses, aka databases. I was discussing databases. I believe that this group is called comp.DATABASES.theory, not comp.relational.model.theory.

Also, more to the point, I was replying to a message that includes both the words NULL and SQL, so it should be clear that I was refering to a database using SQL. I think everyone here is in agreement that SQL is not a truly relational language, so there's no need to repeat that argument once more.

And finally, I think that the closed world assumption will be a hard sell for businesses. I found this definition "Closed world assumption: if you cannot prove P or ~P from a knowledge base KB, add ~P to the knowledge base KB." at http://cs.wwc.edu/~aabyan/Logic/CWA.html. If I understand this correctly, this implies that, if a knowledge base holds information about a person with ID XP55303, but no information whatsoever about the birthdate of this person XP55303, this person does not have a birthdate at all and hence can not exist.

Most businesses will (unless legally prohibited) gladly do business with persons who refuse to state their date of birth - as long as they can wave a valid credit card. Telling a customer she doesn't exist is bad for business. Ergo, most businesses choose the Open World assumption.

>> This removes all NULL values from the tables. But it doesn't remove them
>> completely. As soon as we have to create a report on all data for
>> persons on file, we have to outer join the two tables and get the NULLs
>> right back.
>
>A relational outer join does not produce nulls.

So what does a relational outer join between the two tables below produce:

EmployeeAge           EmployeeGender
name | age            name | gender
-----+-----           -----+--------
Jack | 43             Mary | female
JJ   | 32             Jack | male

>You might use COALESCE with SQL to specify a value for the gaps.

Yes, I can and I frequently do, for presentation purposes. Or if I use an outer join because I need to know what rows are unmatched, I frequently test with "<keycolumn> IS NULL".

I never argued that NULLs should be retained and showed on the reports for the end users, I argued that NULLs are the currently best (or, if you prefer, least bad) way to handle missing data

>> And even if there would be some way to produce such a report without
>> having any NULL, then that would still not be a good solution. Because,
>> as I said earlier, the problem is in the missing data, not in the NULLs
>> used to represent the missing data.
>
>This is a circular argument.

How so?

>> If you had the latter database
>> design and you were asked to report on people over 40 years of age,
>> would you include or exclude persons that are not in the PersonAge
>> table?
>
>We simply restrict the PersonAge table.

How would you restrict it? The sample population above is valid according to the business rules used in this business. You can not impose restrictions that would invalidate this population.

>
>> If a representation of some data can be transformed into another
>> transformation of the same data without losing or adding any
>> information, both representations are exactly equivalent and there
>> should not be any reason from a theoretic POV to dismiss one over the
>> other.
>
>Non sequitur. We could say the same about normalization and ultra-
>redundant databases.
>
>If an ultra-redundant representation of some data can be transformed
>into another
>transformation of the same data without losing or adding any
>information, both representations are exactly equivalent and there
>should not be any reason from a theoretic POV to dismiss one over the
> other.
>
>Ridiculous, isn't it?

Yes, I see where I chose my words wrong. I should have said is like "if a representation for some data can be transformed into another transformation for the same data without losing or adding any information for any possible population of the data in either representation, both representations are exactly equivalent and there should not be any reason from a theorotic POV to dismiss one over the other."

Best, Hugo

Best, Hugo Received on Mon Jan 07 2008 - 23:50:51 CET

Original text of this message