Re: NULLs
Date: Fri, 4 Jan 2008 08:07:50 -0000
Message-ID: <3JKdnRND78jKdODaRVnyvQA_at_pipex.net>
"Hugo Kornelis" <hugo_at_perFact.REMOVETHIS.info.INVALID> wrote in message
news:63qqn31k38u8ad1of2tb7n6r70cekr39bg_at_4ax.com...
> On Sat, 29 Dec 2007 07:00:47 -0800 (PST), JOG wrote:
> When implementing relations in what is currently usually called a
> relational database, it *is* possible to create a table per elementary
> fact type. In that case, no NULLs are ever needed. The facts
>
> * "Employee Jack is male"
> * "Employee Mary is female"
> * "Employee Jack is 43 years old"
> * "Employee JJ is 32 years old"
>
> are represented in two tables, like this:
>
> EmployeeAge EmployeeGender
> name | age name | gender
> -----+----- -----+--------
> Jack | 43 Mary | female
> JJ | 32 Jack | male
>
> But for various reasons (maintainability and performance being the most
> important), it is often preferable to combine these relations in a
> single table, like this:
>
> Employee
> name | age | gender
> -----+-----+-------
> Jack | 43 | male
> JJ | 32 |
> Mary | | female
This is the most sensible explanation I've seen for why people do this, but it still doesn't work for me. In the first place, you--like most of the other posters on this thread--seem to assume that a null is mostly used to indicate when a desired fact certainly exists but is unknown. Experience suggests this almost never happens in the real world. I work with dozens of databases at different sites every year, and the vast majority (all, for practical purposes) use nullable attributes mainly (almost exclusively) to permit multiple distinct fact types to be confused in one table. It appears to me that most practicing database designers/business analysts prefer to avoid even attempting to collect information that is not guartanteed to be known. If the business process doesn't guarantee they will know the employee age, they will prefer not to try to include facts about it at all. Obviously this is not a law of nature, but based on a lot of observations I say null used to mark missing information (as opposed to inapplicable information) is extremely rare. (Health warning: I work mainly with very large government and corporate databases.)
A consequence of this is , contrary to what you say above about performance being improved by using tables with a mix of nullable attributes, performance is often capable of being markedly improved by separating the fact types. Discovering and fixing such examples is how I make most of my living, and it is like shooting fish in a barrel. There are two ways to do it in practice: if the SQL DBMS supports rule-based table partitioning I can work my magic without changing the code, and if it doesn't, then I decompose the tables so they don't have to be scanned at run-time to separate the fact types.
Nor do I buy the maintainability argument. I've not done the math, but intuitively I just don't see the benefit. Then again, I think programmers probably write about two or three or ten times more code than they need to, so perhaps there is too much noise to tell.
Roy Received on Fri Jan 04 2008 - 09:07:50 CET