Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> Re: So what's null then if it's not nothing?

Re: So what's null then if it's not nothing?

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Fri, 25 Nov 2005 01:16:40 +0100
Message-ID: <tugco1taoancholiuf2tmadkadev9ipguh@4ax.com>


On 23 Nov 2005 15:57:55 -0800, michael_at_preece.net wrote:

>Hugo
>
>Thanks for the full reply. You'll see that I've snipped much of our
>preceding conversation. I hope you'll also see that nothing relevant to
>the points I've addressed has been snipped. Let's just move on now
>shall we?...

Hi Mike,

No problem - I hate having to scan through four pages of quoted text to finally find the new content in a message. That's why I usually snip heavily. I will now resume my normal snipping style too.

(snip)
>> You're still trying to put more meaning into it than NULL actually has.
>
>No. I'm pointing out that the source of all of the confusion that
>plainly exists in SQL-land regarding exactly what NULL is and how it
>should be used is very likely due to an inadequate definition in the
>standard. It all seems to stem from that single sentence in the SQL-92
>standard.
>
>> The only thing that NULL means is "there's no value here". Period. I'm
>> sorry if that doesn't make sense to you - but that's just the way it is.
>
>That makes perfect sense to me. What's more - I'm in total 100%
>agreement. The confusion only arises when it is interpreted to mean the
>value is absent. Can you see the difference that exists between
>something being present and yet having no value and something being
>absent? That's the whole reason for NULL - to represent something being
>present and yet having no value.

I see a big difference there - but not in database terms.

Let's face it - we never actaully store persons in a database. With the large mainframes of the 50's, it might still have been possible. With modern computers becoming smaller and smaller, there's no way that you'll ever be able to fit someone with my size in your computer.

The only thing that we do store in computers, and in databases, is values. And the lie (*) we have all agreed to believe in is that these values accurately describe some properties of some entities, events, objects or abstractions in the real world.

(*) I call it a lie, because we all know that as the amount of data in a database grows, the chance that at least some of this data is incorrect will rapidly rise to 1.

If a database holds information about a person named Hugo Kornelis, but that database holds no information about the birthday of this person, does this mean that I don't have a birthday? No, of course not - you and I know that (but the database doesn't - and a user who doesn't know anything about the UoD also doesn't, because he can't determine by name alone if "Hugo Kornelis" is a natural person or a legal person).

There will never be an actual birthdate in the database, but there will be values that represent the birthdate of a person. In this example, the assumption is that the data modeller has produced a model that allows us to store a value (representing the name of a person, and thereby also reprenting the person him- or herself) along with another value (representing a date, assumed to be the birthdate of the person represented by the first value). I also assume that the data modeller has decided to not place a mandatory constraint on the value representing birthdate, i.e. the DB will accept entries even if no value for the birthdate is given.

Now, if any process requests the value of the birthdate associated with the person represented by the name "Hugo Kornelis" (e.g. because it wants to calculate my age, or because it wants to determine if it's time to send a happy birthday card) it can't be served an answer by the database. Because that value is not present in the database. That means that the result of any expression that involves my birthdate is undefined, unless either the database or the language used or the process itself somehow defines what the result of an expression should be if one or more of the values in the expression is not present in the database.

You may have noticed that the entire description above is not about any specific data model. All the above applies to all ways of storing data: SQL, RM, Pick, Network, Hierarchical, Flat-file, or even Carving in stone tablets. They must all have *some* way to represent that the birthday of a specific person is not present in the database, even though the data modeller foresaw that birthdays of persons in general might have to be stored. They also all must have *some* rules for what the result of an expression is if the value of one of the operands in the expression is not present in the database. If they don't, then the "rule" is that the result of such expressions is undefined.

In SQL, the chosen representation for data that is not present in the database even though the data model has room for it, is to either omit a row if all values in that row are not present in the DB, or to use a NULL if only some of the values in that row are not present. And the standard also states that any expression that involves a NULL value as one or more of its operands results the 3-VL boolean value "Unknown" (*) if it's a boolean expression, and NULL otherwise.

(*) Note that the 3-VL boolean value "Unknown" is a known value: we have evaluated the condition, and we now know that the result of that evaluation is "Unknown".

Now, back to some of the points you raised above:

(partial repeat of quote above:)
> That's the whole reason for NULL - to represent something being
>present and yet having no value.

No, the whole reaons for NULL is to represent that the database doesn't have a value to put in a specific slot. Unlike Pick, an SQL database is unable to remove a "cell" (just as you can't remove a cell in a spreadsheet).

If I want to represent something being present and yet having no value, I'm trying to represent two things:
1. Is it present?
2. What, if any, is it's value?
Trying to stuff those two things into one column would be a violation of the First Normal Form.

(partial repeat of quote above, re: coonfusion in SQL-land about NULL)
> It all seems to stem from that single sentence in the SQL-92
>standard.

I don't think so. I think there are three reasons for the confusion:

  1. If a value is not present in the database, a theoretical person who has access to the database but has no domain knowledge won't know the value. To that person, the value is unknown. That doesn't mean that the value is unknown to a real user, who often (by domain knowledge) will know why that value is not present in the database. Unfortunately, the rules of NULL propagation in SQL are often explained by treating NULL as an unknown value. This does even make sense in some way: the database itself, like the theoretic person above, has no knowledge of the domain - it only knows that a value it needds is not present.
  2. The choice of the term "Unknown" for the thrid value in 3-VL adds to this confusion. After all, if we can say that "Null <> 3" is "Unknown", than why can't we say that Null itself is unknown?
  3. I already mentioned the poor quality of the treatment of NULL in some text books in an earlier message. Unfortunately, it's hard to treat NULL properly and still be able to call a book "entry-level". And it's also impossible to not treat NULL at all in a book about SQL.

(snip comparison with city center - you really lost me there!)

>> That is not possible in an SQL
>> table, which has to be rectangular: you can delete an entire row or drop
>> an entire column, but you can't delete a single cell. Setting a NULL
>> value in a column in an SQL table is equivalent to deleting a cell in a
>> Pick database.
>>
>
>OK.
>
>In Pick there are times when we need to record that "there's no value
>here" and, although we *could* delete the single cell, that would not
>be the right thing to do - so it is necessary to store NULL instead.
>That is different to recording that "the value is absent - and
>therefore unknown" - that would be a deleted "cell".

Let's use a couple of concrete examples to clarify this. I'll use the old "middle initial" example again.

  1. George W. Bush. Easy - he undoubtedly does have a middle initial. In SQL, this would be stored as the one character string 'W'. I assume that Pick does this the same way.
  2. Hugo Kornelis. Also easy -- I happen to know the guy and he definitely doesn;t have a middle initial. In SQL, this would be stored as the zero-length empty string ''. How would Pick store my middle iniital?
  3. David Cressey. I don't know him. He doesn't use a middle initial in his messages in this group. That's insufficient proof that he doesn't have one. I could ask, but since it's not really all that important, why bother? I'll just store a NULL in my SQL database. (*) How would Pick store this bit of information?

(*) In this example, the NULL happens to be a result of me not knowing Davids middle initial. But I would also have stored a NULL if David had ever asked me not to store his middle initial. Or if David Cressey was a company and not a person.

>
>In SQL, it seems, not only do you need a special value to repesent the
>fact that "there's no value here", but you also need a way of
>representing the fact that a value would not exist at all if it could,
>physically, have been deleted.

I don't understand this part. Unless I misunderstand, you seem to imply that if there's no value "here", it still is "somewhere else", as opposed to being "physically deleted". What am I missing?

> This is because, although the RM is very
>much about separating the logical from the physical, the limitations
>inherent in the column/row structure (both physical and logical)
>prevent a single cell from being able to be deleted. I'm sure I'll be
>corrected if I'm wrong on this.

A singel cell can't be deleted, because it's part of the data model, not part of the data. The value in a single cell can be deleted by setting the cell to NULL.

>
>> > If I want to record that
>> >something does not have any data value I'll record a null.
>>
>> Now you are losing me. Can you give me a concrete example of something
>> that is present without data value?
>>
>
>OK. I have a mug on my desk. It's empty. Excuse me a moment...

Assuming that "Mike's mug" is a unique identifcation for your mug:

 Mug | Location | Contained fluid

 -----------+-------------+-----------------
 Mike's mug | Mike's desk | NULL

(using tabular format as a shortcut to writing out the CREATE TABLE and INSERT statements)

>
>Now there's no mug on my desk.

 Mug | Location | Contained fluid

 -----------+-------------+-----------------
 Mike's mug | NULL        | NULL

>
>Just a moment...
>
>Now it's a mug of tea.

 Mug | Location | Contained fluid

 -----------+-------------+-----------------
 Mike's mug | Mike's desk | Tea

(snip)
>I'll admit I didn't like what I was reading. It was, incidentally, the
>definition of NULL in the documentation for M$ SQL-Server.

I'm usually quick to admit that Microsoft's documantation has it's flaws. In this case, it's a lot better than I expected. Here's a quote from Books Online:

"A value of NULL indicates the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.

"Null values usually indicate data that is unknown, not applicable, or to be added at a later time. For example, a customer's middle initial may not be known at the time the customer places an order."

The only thing that I would like to change is the first sentence:

"A value of NULL indicates that the value is absent from the database, and as a result unknown to the database engine."

(snip)
> I now
>believe I see the situation as it is. It is a confused situation. NULL
>is taken to mean multiple things. I see NULL clearly as signifying that
>"there's no value here" and consider that to be different and at odds
>with the various other interpretations attributed to it - including,
>and especially, "unknown".

NULL does indeed signify "no value here". That's all.

The implication is that the DB doesn't know what value to use in calculations that refer to this cell. The value is unknown to the database.

The reason that there's "no value here" CAN be that the value is indeed unknown. But there are other reasons as well.

You're right that this causes a lot of confusion. If I could go back in time, I'd try to make sure that better terminology was chosen.

> Naturally, and truthfully, enough - I
>believe my interpretation to be correct. I am 100% convinced that NULL
>means "there's no value here". In Pick, which essentially type-less and
>where everything is a string, it equates to an empty string.

Does this mean that there's no way in Pick to distinguish between "we know Hugo Kornelis doesn't have a middle initial" (i.e. it's an empty string), and "we have no information about David Cressey's middle initial"?

> I accept
>it is not possible to use an empty string in SQL because it is "typed"
>- and NULL is inherently type-less, so it has to be represented by a
>special value.

It IS possible to use an empty string in SQL, but only for character data - indeed, because SQL uses strong type checking.

The reason that an empty string should not be used to represent a missing value is that an empty string is, in some cases, a normal value that has to be distinguished from a missing value.

(snip)
>> NULL is a marker for an absent value.
>
>I agree that a marker is necessary to *represent* NULL in almost all
>models. I disagree that it *is* the marker itself.

Well, that's how the ANSI committe has defined it. So if you're ever forced to make the move from Pick to SQL, you'll just have to accept that NULL *is* the marker itself in SQL databases.

(snipped some to prevent myself from repeating myself over and over)

>and this leads to the
>situation where I find myself reading some documentation that tries to
>tell me that comparing NULL with anything at all - including another
>NULL - will result in "unknown". *That* I find ridiculous - although I
>think I understand now how and why the confusion arose.

Think of it as being asked to compare my birthday to the birthday of Van Nelle.

I never revealed my birthday in this thread - if you had to store this information in an SQL database, you'd have to use NULL. You probably never heard of Van Nelle before reading this message, so the birthday of Van Nelle would also be NULL in your database.

Now, what's the result of

    (Birthday of Hugo Kornelis) == (Birthday of Van Nelle)?

>
>> >The standard would have been better had it
>> >defined NULL as "No value. An empty set. A special value may be used to
>> >indicate its presence.".
>>
>> No. NULL is not "no value"; it is a value - "a special value".
>>
>> NULL is also not an empty set. It is a scalar. Sets and scalars can't be
>> swapped around at will in SQL.
>
>I have to admit I don't know what this means. Can you explain the
>essential difference between sets and scalars

Very short: a set is a collection of zero, one or more rows, that contain one or more columns. A scalar is a single value.

In SQL, there's a very clear distinction between expression where you can use a set, and expressions where you can use a scalar. If you use a scalar where a set is expected or a set where a scalar is expected, you get an error.

> - in so far as it is
>relevant to this discussion.

The only relevance to this discussion is that you can't define NULL as an empty set, because
a) NULL is designed to be used in places where a scalar is expecteed, and
b) there's no need for a symbol to represent an empty set.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Thu Nov 24 2005 - 18:16:40 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US