Re: NULL

From: Laconic2 <laconic2_at_comcast.net>
Date: Sun, 3 Oct 2004 18:01:34 -0400
Message-ID: <kLudnQPc3LdO6_3cRVn-jg_at_comcast.com>


"mAsterdam" <mAsterdam_at_vrijdag.org> wrote in message news:415c9c59$0$78738$e4fe514c_at_news.xs4all.nl...

> Implementors have to deliver.
> If somebody could shed some light
> on the early history of NULL in databases,
> I would be grateful. Pure curiosity.

I can't shed much light on the theory, but here's a little about NULL in Oracle and Rdb

In Oracle: Oracle went the "impossible values" route. That is a NULL is represented in line, by a bit pattern that is distinct from every valid bit pattern.

The Oracle representation for numbers allows for a value I'll call "minus zero". This value is not needed for any actual number, so it got used to represent NULL in a numeric field. I think they used they same impossible value for dates. I don't know what they did about fixed length CHARS, but it's in the Orcle Concepts manual, or at least it used to be.

For VARCHAR, Oracle used a single byte, with value zero. This has the advantage of being short. Short rows are faster than long rows, so that's almost surely the reason. However, the choice is unfortunate. The empty string, the string with zero characters, is also represented by single byte with value zero. Oops! Now we can't tell the apart. But the empty string is a useful value. For example, its the identity over the concatenation operation.

At some time in the future Oracle is going to fix this. God only know how much existing code will be broken. Ah well.

Rdb used the bit vector approach. Every row, without exception, has a bit vector at the end. Each column has a bit. A 1 means a value is present, a zero means no value is present. If no value is present, the cell is filled up with zero bytes, whose meaning is immaterial. As far as taking up too much space goes, Rdb has a row compression algorithm that shortens rows real well, so it turns out to me not much of an issue, especially with long character strings.

Initially, the bit meant "missing value". It wasn't until Rdb adopted SQL as an interface that they began calling it NULL.

In addition to to "missing values" it was possible to define, at least in Datatrieve, a "default value". A default value was plugged in in place of a missing value at store time, when a default value existed. The default value was a plain old value from the domain.

I'm using past tense here, but this might still be true of Rdb, FAIK.

In general, I think Rdb's implementation is stronger. They could have made it even stronger in two ways:
First, they shouldn't have made NULL a possible value in a user defined domain. Columns should be NULLable or not (optional or manadatory) but domains shouldn't deal with NULLS. (My truth, and that of some experts).

Second, they could have stored the"reason why data is missing" in some fashion in the first byte of the cell.
That way there could have been up to 256 reason why a cell is null. Received on Mon Oct 04 2004 - 00:01:34 CEST

Original text of this message