Re: Joins with nulls

From: Bob Badour <bbadour_at_golden.net>
Date: Sun, 24 Nov 2002 01:28:19 -0500
Message-ID: <6n_D9.1189$Ld.237511781_at_radon.golden.net>


"Finarfin" <finarfin_at_sympatico.ca> wrote in message news:7mZD9.11031$kS3.1845079_at_news20.bellglobal.com...
> Bob Badour wrote:
>
> >
> > If first name and last name are always known, I see no need to put
> > them in separate relations. The absurdity lies in your example and
> > not in any problem of data management. For someone with a single
> > name, like Meatloaf, make the last name an empty string. We know
> > what the value of an empty string is: a string of zero characters.
> > An empty string equals any other empty string and collates "less
> > than" any non-empty string.
>
> Hello:
>
> I admit the tone of my follow up may have been a little harsh, but:

I hadn't noticed.

> Isn't your method a means of dealing with missing information? As
> such, isn't it a method of dealing with a null?

No. No information is missing. We know Meatloaf's complete legal name.

> Also, with respect to the example, it is one everyone can relate to.

I am sure everyone can relate to first name and to last name, and I am sure everyone would consider separate relations for them absurd in the simple example you gave. Your example is absurd because you never established a need to model unknown information relating to anyone's name. If both first and last names are always known, no need exists to represent them in separate relations. You constructed a straw man to demonstrate absurdity. The absurdity just doesn't prove anything.

> More obscure examples I must deal with, involving data entered by
> labourers, would require more background. I struggled for some time
> to determine how to set things up so that when someone failed to
> enter a piece of information, I wouldn't be faced with a null. It is
> just not possible. At times there will be blanks. The correct answer
> when asking a question of the data in these cases is NULL, no known
> answer.

I disagree. In more involved examples where one allows the user to enter partial information, one requires a more involved design. NULL causes far more problems than it claims to solve. If the database knows of no correct answers to my question, I am happy to receive an empty set of answers. Again, no NULL required.

> Using the Name table as an example. An analogy to the situation I
> face is: Operator must get a person's name and enter it. Operator
> gets the first name, but due to circumstances, not the last. The
> first name is useable for some purposes and is kept as data. We don't
> know what (or if) the last name was. We have missing data or . . . a
> null. It is inherent in the underlying data.

Under what circumstances? For an operator on the phone, the operator can ask the person to repeat the name. For an operator entering data from a form, the operator can reject the form. Assuming the system does not require both names, how does the operator know they have the first name? Maybe they have the last name. Maybe the name is not missing but simply illegible on the form. Maybe Meatloaf filled out the form. Assuming the system can operate without both names, what objection do you have to using an empty string for the last name instead of NULL?

> For another analogy, consider Date's Suppliers and Parts database.
> Examine his table S. The person who gets the data is able to
> determine that S1 is Smith with status 20, but we don't know and can
> no longer determine the city.

In that case, the person cannot enter the data in S unless the dmbs provides a default city--the predicate of S requires a known city. Alternatively, the person must determine the city or enter the data in a different relation.

> We now have a table with missing data.

I disagree.

> From what I
> have seen, there is no rational way of [capturing partial information]
without nulls.

What else have you seen?

> I
> believe it is precisely this type of situation that Codd envisioned
> when he stated that a true relational system must have a means of
> dealing with missing data.

Currently, no theory addresses missing information. As such, any means of dealing with missing information is necessarily arbitrary and ad hoc. NULL gives a dangerously misleading impression of simplicity. Almost all users will get incorrect or misleading answers to their questions if a database design allows NULL. Even when users fully understand NULL and use great care, chances are high they will get incorrect or misleading answers at least some of the time.

> That being said, I must also get information from a particular
> database where no attempt has been made to eliminate NULLS that do
> NOT represent missing data. This is a nightmare. We don't know if
> NULL represents missing data or poor design .

When this is the case, NULL represents poor design.

>
> JE
Received on Sun Nov 24 2002 - 07:28:19 CET

Original text of this message