Re: So what's null then if it's not nothing?

From: DonR <donr_work_at_yahoo.com>
Date: 18 Nov 2005 13:49:17 -0800
Message-ID: <1132350557.673092.92800_at_g49g2000cwa.googlegroups.com>


Hello Hugo,

Since I'm a "Pickie", I'll throw my 2 cents in here. Caution, a long post.

UniVerse is one of several Pick-like system, I'll use Pick as a generic name here.
The IBM UniVerse programmers manual has this to say about the NULL value.

<Beginning of quote>
Unknown Data: The Null Value

The null value has a special run-time data type in UniVerse BASIC. It was added to UniVerse BASIC for compatibility with UniVerse SQL. The null value represents data whose value is unknown. Note: Do not confuse the null value with the empty string. The empty string is a character string of zero length which is known to have no value. Unlike null, whose value is defined as unknown, the value of the empty string is known. You cannot use the empty string to represent the null value, nor can you use the null value to represent "no value." <end of quote>

As stated, the null value was added to comply with SQL standards. I don't know if any of the other Pick systems define null and if they do, how they do it. For the most part, Pick systems never differentiated between unknown and blank. Of course, the application developer was free to define it, which resulted in a lack of standardation.

Here's a general description of how Pick systems store data where ^ represents an attribute mark, hex FE, and / represents a value mark, hex FD.

RecordKey attribute1^attribute2^attribute3 value1/attribute3 value2/attribute3 value3^attribute4...

RecordKey must be unique and allows direct access to a record (aka row).
Attributes may contain multiple values and values may contain multiple sub-values, which I don't show in my example. All the Pick systems conform to this format and the data is stored in plain ascii. Of course they all differ at the physical file level in how they hash keys, etc

Records are stored in groups and a calculation is performed to determine which group a record is stored in. In other words, if the file has 100 groups and 1,000 records, on average, each group will have 10 records in it. This is how Pick can locate a record in a file with a million records with as few as ONE read.

If a Pick system does not define null, then both a blank value and an unknown value will be stored as nothing. I agree this is a deficiency!

In the example you gave relative to middle initial and age this could cause problems with the Pick non-SQL query language. If you select the file "with age < 40", then Joyce will be selected even though she might be 90 years old. However, Pick programmers and users are a practical lot and handle this with program code or other means. In this example, we could select "with age < 40 and with age # '' ". Of course, if this is a critical attribute, then the application code can force the user to enter a value.

This lack of database constraints leads to a flexible system but puts the burden on the application programmer rather than the DBA. In most Pick sites this the same person!

The Pick query to list your employee file would look like this; SORT EMPLOYEE FIRSTNAME SEX AGE The result would be;

EMPLOYEE       FIRSTNAME    SEX     AGE
1                       Joyce              F
2                       John                M        47

This assumes that FIRSTNAME, SEX and AGE are defined in the dictionary associated with this file. Each data file should have a dictionary but it's NOT required. Again, the lack of a constraint that can be both very good and very bad.

Since you aren't required to have a dictionary, you don't have to define attribute length or data type! Pick stores everything as a string and an attribute, value or sub-balue can be ANY length.

Of course, good programming practice dictates that the dictionary be carefully designed. Most Pick systems have an SQL option, generally ignored by Pickies, that does require such things but only if you want to use SQL.

Pick databases include; UniVerse and Unidata from IBM, Reality from Northgate Systems, D3 from Raining Data (formerly Pick Systems), mvBase from Raining Data, Revelation and Openinsight (GUI) from Revelation Technologies, jBASE from jBASE International, and several others.

An open source version of OpenQM from Ladybridge Systems (UK) is available from openqm.com.

The link to the UniVerse manual quoted above is at; http://publibfi.boulder.ibm.com/epubs/pdf/25119070.pdf

I would be glad to continue this discussion with anyone who so desires.

I told you this was a long post ;-)

Cheers,
Don
Ashwood Computer Co.
Cincinnati, Ohio

Hugo Kornelis wrote:
> On 16 Nov 2005 22:03:14 -0800, michael_at_preece.net wrote:
>
> >I've been accustomed to thinking of things either having a value or
> >not. If something has no value then, to me, its value is null. Its
> >value is an empty string (whatever "it" is). Different to having a
> >value of zero. Different to anything with a value.
>
> Hi Mike,
>
> But not different to something with the value '' (aka empty string).
>
>
> > Now, as I read up on
> >SQL, I find that null is supposed to mean "unknown".
>
> You might want to start reading other books then. Null is *not* supposed
> to mean "unknown". According to the ANSI standard, the "null value" (as
> it's called there) is "A special value, or mark, that is used to
> indicate the absence of any data value".
>
> Both Null and Unknown are important issues to understand when using SQL.
> Allow me to explain. Warning: long post ahead! <g>
>
> First: NULL.
>
> Let me give an example. I'll use tables - not realtional tables, but pen
> and paper tables. Forget that we're all using computers nowadays, and
> take a look at the two tables below (use a fixed width font).
>
> Emp# | First Name | Sex
> ------+------------+-----
> 1 | Joyce | F
> 2 | John | M
>
> Emp# | Age
> ------+-----
> 2 | 47
>
> What can you tell me about Joyce's age? The only correct answer to that
> question can be "nothing" - there is no information about Joyce's age in
> the tables above, and since you don't know her personally, there's
> nothing you know about her age.
>
> Now let's combine the tables. Note that the information doesn't change,
> only the way we preresent it does:
>
> Emp# | First Name | Age | Sex
> ------+------------+-----+-----
> 1 | Joyce | | F
> 2 | John | 47 | M
>
> Since we didn't change the information, only the representation, we
> should still know the exact same thing about Joyce's age: nothing. This
> is indicated by the empty cell in the Age column of the row for Emp# 1.
>
> This works for numeric columns, such as Age - since a blank cell clearly
> doesn't hold a valid number, it's obvious that the cell is indeed empty.
> But we have a problem with alphanumerical information. Damien already
> gave a greata example in his reply: the middle initial. Let's add this
> as an extra column to our table.
>
> Emp# | First Name | Age | Middle Initial | Sex
> ------+------------+-----+----------------+-----
> 1 | Joyce | | | F
> 2 | John | 47 | | M
>
> For reasons unrelated to this post, I don't know anything about John's
> middle initial - he might or might not have one, and if he does, I'm not
> privy to it. So I left the Middle Initial cell for John (Emp# 2) empty.
>
> On the other hand, I *know* that Joyce has no middle initial. I asked
> her yesterday, and she confirmed. Since I hate forgetting personal
> details, I immediately wrote down an empty string in Joyce's Middle
> Initial cell.
>
> But how will I be able to see the difference between John's empty cell
> and Joyce's cell that contains the empty string? I guess I'll have to
> use color codes, or something like that.
>
> In relational databases, that is solved by Null - a marker that
> explicitly denotes that there is no value in a column, to distinguish it
> from a column that holds an empty string, a numeric value of zero or a
> date value of "beginning of time" (or whatever).
>
>
> Next: UNKNOWN.
>
> Null is not the same as Unknown. Many people like to say that it is, but
> it's not true. A Null in a database is often a result of something begin
> unknown. I already admitted that I don't know John's middle initial. So
> you might say that John's middle initial is unknown (to me at least; I'm
> sure John does know), and that is represented by Null.
>
> But nowhere have I written Joyce's age is unknown. It might be - but
> there might be other reasone for her age being absent from the tables
> above. Maybe it's a permissions issue (she's working for another
> department; I've got full access to personal details of my department's
> personnel, but limited acces for other employees). Maybe the company's
> director is an old-fashioned guy who considers it rude to ask a lady her
> age - maybe I do know too well how old Joce is, but am precluded by
> company policy fromm entering it in the table. Or maybe you were right
> after all, and I simply don't know Joyce's age. Who'll tell??
>
> Null is the absence of a value. There can be many reasons for this
> absense. If it's important to know the reason, then this reason should
> be modelled as a seperate piece of information. Since there is no such
> data in the example tables above, it's clear that for this company, it's
> not important to know WHY nothing is known about Joyce's age.
>
> But since Joyce's age is Null, her age IS unknown *within the context of
> the database*. Any program that uses data from these tables, and any
> person that looks at these tables and has no further knowledge, will
> know nothing about Joyce's age. For those programs and people, Joyce's
> age will be not known (but only for them - so this is not the same as
> saying that Joyce age is unknown).
>
> I deliberately wrote "not known" above, to avoid confusion with
> "Unknown". As asid, Unknown is not Null. Unknown is a boolean value that
> extends the well-known True and False boolean values.
>
> Given the tables above, would you say that the statement "John is older
> than Joyce" is true or false?
>
> The only correct answer to that can be "I don't know". Since you know
> nothing about Joyce's age, you can't judge whether John is older than
> her or not.
>
> So can we say that the answer to the question above should be Null? No,
> we can't. Null means the absence of a value. But we just discusssed the
> statement about John's and Joyce's age, so there now IS an answer to the
> question if John is older. The answer would have been Null before I
> wrote these paragraphs, but the question has been answered since. Not
> with "yes" (True) or "no" (False), but with "beats me" (Unknown).
>
> And that's exactly how the ANSI standard defines Unknown - compare any
> value with Null, and the result will never be True or False, but always
> Unknown.
>
>
> >I can't
> >imagine having to write code where the "if a=b then result=true else
> >result=false" construct won't work - according to what I'm reading, if
> >either a or b is null then I should be setting result to unknown
> >instead.
>
> Well, you don't have to write such code. Just buy and install a decent
> mainstream RDBMS product. They have this logic embedded.
>
> Best, Hugo
> --
>
> (Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Fri Nov 18 2005 - 22:49:17 CET

Original text of this message