Re: What does this NULL mean?

From: dawn <dawnwolthuis_at_gmail.com>
Date: 11 Dec 2005 07:00:20 -0800
Message-ID: <1134313220.237634.172550_at_g47g2000cwa.googlegroups.com>


John Eggert wrote:
> David Portas wrote:
>
> > John Eggert wrote:
> >
> >>
> >> Ok, I've been somewhat ignoring this entire topic lately, but I have a
> >> real world situation that I'd like to see a solution for. I am currently
> >> moving the operator reporting for an industrial process from a paper
> >> system to a computer based system. One thing that is occuring with the
> >> paper system is:
> >>
> >> An operator does what are called 'rounds'. They do certain things at a
> >> certain time. Most of these are manual checks of various circuit
> >> parameters. Take one of these as an example: The amount of activated
> >> carbon per tank ( I like this one because there is no automated method
> >> for this measurement at this time ). They then write this number down. On
> >> occasion, it is not possible to get the value. They are usually busy
> >> somewhere else. This information (the fact that the operator missed a
> >> round) is indicated by the blank cell in the hand written report sheet. I
> >> would interpret this as a NULL value. How can I capture the same data
> >> (the missing information IS the data) in a database? Just to be clear,
> >> the empty cell in the completed operator's report sheet IS USEFULL DATA.
> >> It isn't 0, it isn't unknown, it is missing data. The fact that >>the
> >> data is missing<< is the data I need to capture. I need to be able to
> >> quantify how frequently the data is missing, if it is any particular
> >> operator or any particular part of the process etc. I am using "NULL" as
> >> a default value. Given the long debate on the good bad and ugly of NULL,
> >> what is wrong with that? Isn't this exactly the situation where a NULL
> >> value is needed, appropriate and functionally usefull?

This is an excellent case. In this case you are both saying that an attribute value that is modeled in the system is missing and that you have a proposition you need to capture something like <data value> is missing.

If you want a normalized solution, you would have a table, C, that captures a row for each of the actual values -- those not missing and, in spite of objections I'm sure, you could record on the parent table P to this one a flag related to the proposition of data missing. This sounds redundant. A True for missing data in P would relate to the absense of a row in C. However, there are other reasons for data missing, including that it wasn't time to test for this value or the data from the test has not yet been entered.

I don't particularly like this implementation, but it is normalized.

> >> With respect to the normalization aspect. How can one better normalize a
> >> system so that one can capture the fact that someone did not fill in
> >> something that should normally have been filled in but for unknown
> >> reasons did ( and possibly could ) not? If I have single attribute tables
> >> I can still have the missing data. How can I further normalize a single
> >> number?

What I would prefer is this:

You have data you want to capture for an entity, P, that could be numeric, could be "NA" for not available, and could be empty. If a sheet is submitted with a number -- the value is numeric; if the sheet is submitted with a blank, then we know something -- we know the value is not available. If we do not have a sheet, then the value (yes, I said value) is null (in my 2VL). The design might be that there is no P if there is no sheet, so you might be able to ignore that.

A problem, however, crops up with the type system. This attribute is tagged as numeric, most likely, so NA won't cut it. The only non-numeric possibility then is NULL. This is also how you get all 9's or all 8's to mean different conditions in numeric data -- because the type of the value isn't strictly numeric, but the type is defined to be so.

Bottom line is that I agree that in a SQL-DBMS implementation this would be a possible place for a NULL because you don't want to define a new type that has numeric values plus NA and other options are also not attractive. It isn't good theory, but it is possibly a best-practice anyway.

Just my two cents. --dawn Received on Sun Dec 11 2005 - 16:00:20 CET

Original text of this message