Re: NULL
Date: Mon, 04 Oct 2004 05:59:23 GMT
Message-ID: <%E58d.317259$Fg5.299948_at_attbi_s53>
"Laconic2" <laconic2_at_comcast.net> wrote in message news:kLudnQPc3LdO6_3cRVn-jg_at_comcast.com...
>
> For VARCHAR, Oracle used a single byte, with value zero. This has the
> advantage of being short. Short rows are faster than long rows, so that's
> almost surely the reason. However, the choice is unfortunate. The empty
> string, the string with zero characters, is also represented by single
> byte with value zero. Oops! Now we can't tell the apart. But the empty
> string is a useful value. For example, its the identity over the
> concatenation operation.
This is just laughably bad design. :-( Can't tell a null string from an empty string! About the only worse thing they could have done was conflate 0 and null-integer!
> Rdb used the bit vector approach.
Excellent!
> In addition to to "missing values" it was possible to define, at least in
> Datatrieve, a "default value". A default value was plugged in in place of
> a missing value at store time, when a default value existed. The default
> value was a plain old value from the domain.
This is a quite useful feature, and one that has pretty much no implications for theory.
> In general, I think Rdb's implementation is stronger.
It certainly sounds that way!
> They could have made it even stronger in two ways:
> First, they shouldn't have made NULL a possible value in a user defined
> domain. Columns should be NULLable or not (optional or manadatory) but
> domains shouldn't deal with NULLS. (My truth, and that of some experts).
Exactly! (Am I an expert now?)
> Second, they could have stored the"reason why data is missing" in some
> fashion in the first byte of the cell.
> That way there could have been up to 256 reason why a cell is null.
Eeep! Don't go there!
The best way to think about NULL is as a cardinality issue; the whole "there *is* a value but we don't know what it is" thing takes us out of the realm of math and into the realm of speculation. What is 5 + null? Well, it has some value, all right, but it's a mysterious value that we can't explain. It comes from the afterlife, maybe.
Instead, let us conceive of a nullable int as an in that is either there or it isn't. This makes null handling *so* much easier and regular. What is the sum of a list of integers (all of which have the value 5) when the list has the following length:
2 -> 10 1 -> 5 0 -> 0
Very simple, regular, and comprehensible. This works just as well for two individual values which each are either an int or else aren't anything.
If you need 256 reasons why something is something besides a value, then algebraic data types give a principled approach for dealing with the situation. And it lets you do it in whatever way works best for your particular domain. In contrast, baking this kind of thing directly into the type system is doomed to be forever stuck in one way of doing things, that may or may not match your problem domain.
Marshall Received on Mon Oct 04 2004 - 07:59:23 CEST