Re: Proposal: 6NF

From: Roy Hann <specially_at_processed.almost.meat>
Date: Sat, 30 Sep 2006 21:20:05 +0100
Message-ID: <zLmdnQxwyMHjTIPYnZ2dnUVZ8qudnZ2d_at_pipex.net>


"Brian Selzer" <brian_at_selzer-software.com> wrote in message news:WvxTg.11808$7I1.3639_at_newssvr27.news.prodigy.net...

> Recording propositions is only about 5% of the typical activity for a
> database. With today's technology, querying multiple tables is almost
> always more expensive than querying one,

Simply wrong.

Consider a table SHIPMENTS with, say, a million rows. One nullable attribute might be CAUSE_OF_LOSS. Since shipments are almost never lost this attribute is almost invariably null. Supposing that the DBA is even slightly skilled and chooses to physically key both tables on SHIPMENT_NUMBER, which query for the value of shipments lost to pirates will be quicker? The one that scans the million rows? Or the one that scans the CAUSE_OF_LOSSES table then joins the (possibly empty) restriction with the SHIPMENTS table on SHIPMENT_NUMBER?

I would argue that the limitations of today's technology are precisely why this approach is so useful. Not only does it make it easier to reason about what is going on, but it also helps the real system go like blazes. I regularly apply such decompositions to solve performance problems in systems with tables with hundreds of millions of rows.

[snip]
> For example, if you have a table with 30 columns where 15 are NULLable,
> you would need as many as 16 tables to eliminate NULLs.

That is a far-fetched possibility. Maybe you could construct a hypothetical table with nullable attributes that would have to be decomposed that way, but in practice it doesn't happen. One rarely needs more than three or four additional tables, and often less. And as noted above, they often actually improve performance.

Roy Received on Sat Sep 30 2006 - 22:20:05 CEST

Original text of this message