Re: WWW/Internet 2009: 2nd CFP until 21 September x

From: Walter Mitty <wamitty_at_verizon.net>
Date: Sat, 15 Aug 2009 03:27:51 GMT
Message-ID: <XWphm.2200$Jg.1190_at_nwrddc01.gnilink.net>


"paul c" <toledobythesea_at_oohay.ac> wrote in message news:Pfmhm.41275$PH1.333_at_edtnps82...
> Walter Mitty wrote:
> ...
>> Connecting SQL and "facts" can be tricky, particularly when nulls are
>> involved.
>>
>> If I say
>>
>> select distinct OrderId, MailingAddress from Orders
>>
>> and there are some rows with Null in MailingAddress, those rows don't
>> appear in the result. The row may exist in orders, but its projection on
>> the space
>> {OrderId, MailingAddress} doesn't exist (at least in the dialects I
>> used). (assuming that what SELECT DISTINCT does is really a projection.)
>>
>> Likewise if I say select count(MailingAddress) from Orders
>>
>> I'm going to get the count of rows that have a value in MailingAddress.
>> If I compare with
>>
>> select count(1) from Orders where MailingAddress is not null
>> select count(1) from Orders where MailingAddress is null
>> select count(1) from Orders
>>
>>
>>
>> I can see what happened. If some rows actually have null in Mailing
>> address, it is as though those rows were not there when you do some
>> operation that only deals with rows containing a value in MailingAddress.
>>
>> Comparing this behaviour with either the open world assumption or the
>> closed world assumption is largely in the mind of the beholder.
>
> Thanks for those examples. They suggest to me that SQL tables with nulls
> must indeed have multiple predicates,

That's a little over my head, but I suspect that it is right.

The way I think of it is that every table with nulls in it is a materialized outer join. If you can decompose the table into multiple tables each of which has no nulls, what you discover is that a null in the combined table corresponds to an absent row in one of the decomposed tables.

Let me shift gears back into practical mode for a minute. In any database I've ever worked with, the majority of columns are not a primary key, or a foreign key or a part of a primary or foreign key, or ever appear in a where of having clause. Nulls in those columns are of almost no consequence at all in the overall behavior of queries. Shunning nulls in those cases is being overly picky.

Nulls in "important" columns almost always cause more trouble than decomposing tables would cause, but nulls in inumportant columns help keep things simple.

But I guess comments like this one are out of place in a theory newsgroup. Received on Sat Aug 15 2009 - 05:27:51 CEST

Original text of this message