Re: So what's null then if it's not nothing?

From: Alexandr Savinov <spam_at_conceptoriented.com>
Date: Mon, 21 Nov 2005 15:08:44 +0100
Message-ID: <4381d4fe$1_at_news.fhg.de>


michael_at_preece.net schrieb:
> I've been accustomed to thinking of things either having a value or
> not. If something has no value then, to me, its value is null. Its
> value is an empty string (whatever "it" is). Different to having a
> value of zero. Different to anything with a value. Now, as I read up on
> SQL, I find that null is supposed to mean "unknown". I can't easily
> accept that. Does SQL's definition of null (unknown) include the null
> I'm familiar with (no value)? That doesn't make sense. If we know
> something has no value then its not an unknown value is it? I can't
> imagine having to write code where the "if a=b then result=true else
> result=false" construct won't work - according to what I'm reading, if
> either a or b is null then I should be setting result to unknown
> instead. Just can't get my head 'round that. Shouldn't things be a lot
> simpler? If something has an unknown value then at least we know
> whether it's null (as in an empty string) or not. To me, "unknown" can
> be compared with an empty string to see if it's null or not. Sorry -
> head is spinning.

Here is a formal setting where we can define and see the difference between normal values, Unknown and Null.

Domain is a set of elements (values):

--|---|---|---|---|---|--> Domain (say, Years)   87 88 89 90 91 92

Normal value is a single element. For example, Age=1988 then it will be represented as the following possibility distribution which disables all values except for the selected one:

1----- _ -----------------
       | |  Age=1988
0_____| |________________

  --|---|---|---|---|---|--> Domain (say, Years)    87 88 89 90 91 92

Note that here 0 is impossiblitiy while 1 is possiblity (uncertainty).

If we say that this attribute (Age) does not make sense for an object then the possibility distribution is constant 0:

1-------------------------
          Age=NULL
0________________________

  --|---|---|---|---|---|--> Domain (say, Years)    87 88 89 90 91 92

If age is unknown that any value from the domain is possible:

1________________________
         Age=UNKNOWN

0
  --|---|---|---|---|---|--> Domain (say, Years)    87 88 89 90 91 92

Thus NULL is constant 0 while UNKNOWN is constant 1.

With multi-values domain everything is ok. But with two-valued domains there is a commmon misunderstanding. There is two options how can we model two-valued domains yes and no:

  --|---|--> Two-valued domain
   no yes
Here we simulate the normal case. Note that two values yes and no have equal rights and have nothing special in them - they are just some objects from the domain. Here we can choose either one from these values or NULL or UNKNOWN (4 values in total). Such an approach is good for domains where two values do not have any special logical treatment, say, Sex={M,F}. Here neither M nor F denote absence or existence or anything in this sense.

Here is the second option:

yes|--

    |
no-|--> One-valued domain
"has car"

Here the domain consists of the only element which denotes the boolean attribute (for example, has car). Two values are now encoded in possibility degrees (or any other semantic characteristics like true-false, 1-0 etc.) Here changing a value of the attribute means changing its semantic (logical) value rather than changing a value from the domain (there is only one value in the domain). Such a mechanism is convenient when we are talking about existence/non-existence, appropriateness and similar things. A criterion is that these two values have a (logical) meaning for the database (for the data model) and it takes then into account when manipulates data. In contrast, when we use the first option, the values from the domain do not make any sense for the underlying model. In other words, in the first approach the model does not see a difference between M and F because they are two elements from the domain (horisontally ordered). In the second case the model does make significant difference between yes and no because they semantic values (vertically ordered). Another difference is that in the first case there is 4 possible distributions while in the second case only 2.

Anyway, the idea is that NULL and UNKNOWN have concrete semantics that can be expressed as complete impossibility and complete possibility. (There are also other compatible treatments.)

-- 
http://conceptoriented.com
Received on Mon Nov 21 2005 - 15:08:44 CET

Original text of this message