Re: NULLs

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Tue, 08 Jan 2008 00:56:43 +0100
Message-ID: <1nd5o3ln30826uemn4t35j7qea0lcp33sr_at_4ax.com>


On Fri, 4 Jan 2008 08:07:50 -0000, Roy Hann wrote:

(snip)
>This is the most sensible explanation I've seen for why people do this,

Hi Roy,

Thanks! :)

> but
>it still doesn't work for me. In the first place, you--like most of the
>other posters on this thread--seem to assume that a null is mostly used to
>indicate when a desired fact certainly exists but is unknown.

No. I assume that a NULL is always (not mostly) used to indicate that a fact that could have been stored is missing. It does not imply that the fact is desired, nor that it certainly exists, nor that it is unknown.

Or rather, I am trying to put across that a NULL *should* always be used like that - I am fullly aware that this is not how NULLs are mostly used in the real world :)

> Experience
>suggests this almost never happens in the real world. I work with dozens of
>databases at different sites every year, and the vast majority (all, for
>practical purposes) use nullable attributes mainly (almost exclusively) to
>permit multiple distinct fact types to be confused in one table.

That is indeed a common use of NULL. And though I agree that it is far too common, I'd wager that I'm not as vehemently opposed to this as you are.

When I teach modeling, I try to give the students a feeling for where to draw the line between "required" normalization and "overdone" normalization. Using Dutch addresses as an example, one might argue that city is functionally dependant on postal code. As is, in fact, street name (Dutch postal codes form a very fine grid - most streets span two postal codes [one for odd, one for even house numbers], longer streets have even more). And one might then go on and argue that the area code of the phone number is in turn functionally dependant on the city. But, in the vast majority of applications, this is not useful at all. Most applications just want to print an address on a form, stick a stamp on it and be done. And most applications don't even care about the area code and local number as seperate attributes - they just need to be able to punch a number to contact a customer. So I consider a table like

   Persons (SSN (PK), Street, HouseNo, PostalCode, City, PhoneNo) to be adequately normalized - even when purists would prefer a design such as

   Persons (SSN (PK), PostalCode (FK), HouseNo, LocalPhoneNo)    PostalCodes (PostalCode (PK), Street, City (FK))    Cities (City (PK), PhoneAreaCode)

Back from a digression - I've also seen situations where there were mutually exclusive attributes, but I saw absolutely no harm in keeping them in the same table, with a proper constraint to prevent them both being populated. This should not be a common practice, but there are cases where it's good enough.

> It appears
>to me that most practicing database designers/business analysts prefer to
>avoid even attempting to collect information that is not guartanteed to be
>known.

Definitely true. Though I'm not always sure if the designer is to blame, or the end user (or decision maker) refusing to commit to ensure that all data is collected at data entry time.

>Nor do I buy the maintainability argument.

I *think* that this is partly historic (some decades ago, the maximum number of tables in a database was much lower than at present), partly perceived (imagine opening up some management tool and being presented with a list of thousands of tables...), and partly what people have become accustomed to.

Best, Hugo Received on Tue Jan 08 2008 - 00:56:43 CET

Original text of this message