Re: boolean datatype ... wtf?

From: Brian <brian_at_selzer-software.com>
Date: Wed, 6 Oct 2010 11:14:17 -0700 (PDT)
Message-ID: <afd2fd6c-989b-48ba-bfb7-cc27acf5dcec_at_i13g2000yqd.googlegroups.com>


On Oct 6, 2:41 am, Hugo Kornelis
<h..._at_perFact.REMOVETHIS.info.INVALID> wrote:
> On Tue, 5 Oct 2010 16:12:12 -0700 (PDT), Brian wrote:
>
> (snip)
>
> >I never said anything about trash.  NULLs are not trash, they're
> >indicators: no value here.  Eliminating each requires not only
> >splitting off the nullable column into a separate relation but also
> >introducing a boolean indication to distinguish between inapplicable
> >nulls and applicable nulls.  The bottom line: if you're a null hater
> >and a boolean hater, then you need two additional tables to eliminate
> >each nullable column, along with two additional interrelational
> >contraints.  Are you denying that?
>
> Hi Brian,
>
> I have no idea where this extra relation comes from.
>
> Yes, it is true that, in general, a NULL value can have various reasons
> (and that goes beyond "applicabe/inmapplicable", so a simple boolean
> would not do).  In practice, a specific column can often only be NULL
> for one of those reasons, so that it is not required to store the
> reason; the reason is metadata that should be stored in the data
> dictionary, not in the data.
>
> There are also cases where more than one reasons may apply to the NULLs
> in a single column. In those cases, there are two possibilities. Either
> the reason why the value is missing is important to the business, or it
> is not. If it is not, then we only need the nullable column in the
> design with NULLs and nothing else - and when transforming this to the
> NULL-less design, there is no need at all to suddenly introduce an extra
> indication to store information the business is not interested in.
>
> And if the distinction between the various reasons for missing
> information IS important to the business, then we would have had to add
> another column in the design with NULLs. After all, "birthdate" and
> "reason birthdate is not recorded" are two completely different
> propositions, and any attempt to combine them in a single column would
> be a serious violation of 1NF. (This, by the way, is the fallacy in
> Date's arguuments in his "Much ado about Nothing" debate with Codd
> (http://www.dbdebunk.com/page/page/1706814.htm), which Codd
> unfortunately did not see through). So in the case where the business
> does want to record the reason for a missing attribute value, the only
> correct way to model this is to include an extra column - one that is
> also nullable, and that is constrained to be null when the attribute
> column is not null, and to be not null when the attribute column is
> null. Converting THIS schema to a NULL-less design will of course
> require the addition of two extra tables - but since they originate in
> two nullable columns, the ratio of (at most) one additional table for
> each nullable column still applies.
>
> So yes: I am denying that more than one table is ever required to
> eliminate a nullable column - with or without the boolean data type.
>
> Best, Hugo

I think you misunderstand what I'm saying: whenever there is no value, either there isn't supposed to be a value, or there is supposed to be a value but it hasn't been supplied. In other words, either the predicate of the atomic formula that expresses the assertion does not have a place for an individual term that corresponds to the nullable column name, or there is a place but an individual variable instead of an individual constant occupies that place. Why there is a variable instead of a constant in that place doesn't change the fact that the predicate still has a place for that term. So whether there is supposed to be a value determines the atom's predicate. It follows that since it can't be determined just from the absence of a value whether there is supposed to be a value, it must be explicitly stated that there is or is not supposed to be a value. If you're only a null hater, then only two relvars are needed: one with an additional boolean attribute that specifies for each tuple whether there should be a value or not, and another relvar for the values that have been supplied. If you're also a boolean hater, then the relvar with the boolean attribute would have to be split as well. Received on Wed Oct 06 2010 - 20:14:17 CEST

Original text of this message