Re: NULLs: theoretical problems?

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 13 Aug 2007 10:37:03 GMT
Message-ID: <jvWvi.13764$eY.1930_at_newssvr13.news.prodigy.net>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:1186971690.979746.159320_at_l70g2000hse.googlegroups.com...
> On Aug 8, 11:43 am, "sinister" <sinis..._at_nospam.invalid> wrote:
>> Many discussions point out one deficiency of NULLs: that they collapse
>> multiple, distinct concepts into one ("no value possible," "value
>> missing,"
>> "value not available at this time", etc).
>>
>> What are the other theoretical problems? My impression from skimming
>> some
>> threads in this ng is that some anomalies might occur, maybe having to do
>> with NULLs and joins, or NULLs and keys composed of more than one field,
>> but
>> I'm not sure.
>
> A database stores true propositions. A statement with a hole in it (an
> SQL null) is not a proposition, and hence is a theoretical abhorrence.
>
> Or from a different angle you might want to consider that a relation
> is a set of tuples. A tuple must contain a value in every position, or
> it is not a valid tuple (not being a subset of the cartesian product
> of the domains being considered). Hence an SQL-null is a theoretical
> abhorrence.
>
> The solution is of course to decompose a relation around its key so
> that no nullable columns are required. This results in a schema with
> more relations, and tends to produce queries with more joins, but
> without any theoretical or logical flaws.
>

It is true that it is always possible to decompose a relation around its key(s) so that no nullable columns are required, but the process isn't as simple as you imply. If not done correctly, there is a danger of theoretical and logical flaws. As you decompose such a relation, there are questions you must answer about each nullable column:

(1) Can there always be a value for this column? (2) Whenever a value is supplied for this column, must there also be a value for another column?
(3) Is there already a relation with the same heading as one of the decompositions?

These questions control how to decompose the relation and which integrity constraints are needed.

If (1) is false, then what is needed is two additional relations and two additional integrity constraints. One relation is needed for the case where there cannot be a value; one relation is needed for the case where there can be a value, but it's missing; one referential integrity constraint is needed from the relation with the value to the one for the case where there can be a value, and one integrity constraint is needed to prevent the same tuple from appearing in both the relation for the case where there cannot be a value and the relation for the case where there can be a value.

If (1) is true, then all that is needed is one extra relation and one integrity constraint. The extra relation is for the case where there can be a value, but it's missing; the integrity constraint ensures that whenever a tuple with the value appears, there is a tuple without one.

If (2) is true, then "value" in the preceding two paragraphs is not just a single value for a single attribute, but a set of values for a set of attributes.

If (3) is true, then the decomposition may introduce redundancy into the database, so the existing relation must be examined to determine if its tuples share the same meaning as those for any of the decompositions. If it does, then it should be treated as one of the relations required by (1).

> Some view such decomposition as being computationally inefficient,
> others reply that this is a physical implementation issue and not a
> concern of the logical model.
>
> Given the fact that it can generate much longer queries, I sometimes
> find myself allowing nulls in personal home-brew projects out of sheer
> laziness. However I do suffer from internal pangs of guilt during the
> process ;)
>
Received on Mon Aug 13 2007 - 12:37:03 CEST

Original text of this message