Re: boolean datatype ... wtf?

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Wed, 06 Oct 2010 08:41:45 +0200
Message-ID: <r85oa6d5634sqruj14ihk72seo25ao7h3a_at_4ax.com>


On Tue, 5 Oct 2010 16:12:12 -0700 (PDT), Brian wrote:

(snip)
>I never said anything about trash. NULLs are not trash, they're
>indicators: no value here. Eliminating each requires not only
>splitting off the nullable column into a separate relation but also
>introducing a boolean indication to distinguish between inapplicable
>nulls and applicable nulls. The bottom line: if you're a null hater
>and a boolean hater, then you need two additional tables to eliminate
>each nullable column, along with two additional interrelational
>contraints. Are you denying that?

Hi Brian,

I have no idea where this extra relation comes from.

Yes, it is true that, in general, a NULL value can have various reasons (and that goes beyond "applicabe/inmapplicable", so a simple boolean would not do). In practice, a specific column can often only be NULL for one of those reasons, so that it is not required to store the reason; the reason is metadata that should be stored in the data dictionary, not in the data.

There are also cases where more than one reasons may apply to the NULLs in a single column. In those cases, there are two possibilities. Either the reason why the value is missing is important to the business, or it is not. If it is not, then we only need the nullable column in the design with NULLs and nothing else - and when transforming this to the NULL-less design, there is no need at all to suddenly introduce an extra indication to store information the business is not interested in.

And if the distinction between the various reasons for missing information IS important to the business, then we would have had to add another column in the design with NULLs. After all, "birthdate" and "reason birthdate is not recorded" are two completely different propositions, and any attempt to combine them in a single column would be a serious violation of 1NF. (This, by the way, is the fallacy in Date's arguuments in his "Much ado about Nothing" debate with Codd (http://www.dbdebunk.com/page/page/1706814.htm), which Codd unfortunately did not see through). So in the case where the business does want to record the reason for a missing attribute value, the only correct way to model this is to include an extra column - one that is also nullable, and that is constrained to be null when the attribute column is not null, and to be not null when the attribute column is null. Converting THIS schema to a NULL-less design will of course require the addition of two extra tables - but since they originate in two nullable columns, the ratio of (at most) one additional table for each nullable column still applies.

So yes: I am denying that more than one table is ever required to eliminate a nullable column - with or without the boolean data type.

Best, Hugo Received on Wed Oct 06 2010 - 08:41:45 CEST

Original text of this message