Re: teaching relational basics to people, questions

From: Mr. Scott <do_not_reply_at_noone.com>
Date: Sun, 22 Nov 2009 03:26:23 -0500
Message-ID: <H8KdnceVl4MtaJXWnZ2dnUVZ_sidnZ2d_at_giganews.com>


"Roy Hann" <specially_at_processed.almost.meat> wrote in message news:3eudnYx_I7CpiZXWnZ2dnUVZ7qSdnZ2d_at_pipex.net...
> Sampo Syreeni wrote:
>
>> [snip] And since
>> one would have to have a bona fide range datatype, building in
>> handling for infinite ranges would also be easy; that'd get rid of one
>> of the most persistent reasons why people incorporate nulls into
>> designs.
>
> I think you are being excessively optimistic. The most persistent (and
> most common) reason people incorporate nullable columns into designs is
> because they have a misplaced desire to minimize the number of tables in
> the design, and think that conflating multiple fact types in one table
> is clever, efficient, and harmless.

I think you are oversimplifying. The most persistent (and most common) reason people incorporate nullable columns into designs is the not so misplaced desire to provide for information that is relevant but not required, and that doesn't necessarily involve conflating multiple fact types in one table. Each table design represents a family of interdependent predicates. For example, the the simple table design,

{EMPLOYEE, DEPENDENTS} KEY(EMPLOYEE) could represent the interdependent predicates,

'there is an employee <EMPLOYEE>' iff
'employee <EMPLOYEE> claims <DEPENDENTS> dependents.'

The predicates are interdependent. Given a row (EMP1, 2), the assertion that 'there is an employee EMP1' implies that 'employee EMP1 claims 2 dependents' due to the functional dependency EMPLOYEE -> DEPENDENTS, while at the same time the assertion that 'employee EMP1 claims 2 dependents' implies that 'there is an employee EMP1.'

Permitting nulls in the DEPENDENTS column doesn't change the fact that each employee claims some number of dependents, so long as the predicates remain interdependent. What it does do is to permit recording the assertion that there is an employee even when the precise number of dependents to be claimed is as yet unknown. I should point out that splitting up the table in order to eliminate the nulls doesn't fix anything. For example, splitting the table into

T1 {EMPLOYEE} and T2 {EMPLOYEE, DEPENDENTS}

where T1 represents the predicate,

'there is an employee <EMPLOYEE>,'

and T2 represents the predicate,

'employee <EMPLOYEE> claims <DEPENDENT> dependents,'

requires the introduction of referential constraints from T2 to T1 and also from T1 to T2 since the original predicates are interdependent. Dropping the referential constraint from T1 to T2 is a problem because the fact that each employee claims some number of dependents is no longer explicitly specified nor is it a logical consequence of the constraints that remain.

The problem you're referring to, the one which involves conflating multiple fact types, occurs only when the relationship between the predicates is disjunctive. For example,

'there is an employee <EMPLOYEE>' or
('there is an employee <EMPLOYEE>' iff
'employee <EMPLOYEE> claims <DEPENDENT> dependents.')

It should be easy to see that this mirrors the split tables T1 and T2 without the referential constraint from T1 to T2. While one can't determine whether null indicates that an employee doesn't claim dependents or that the number of dependents claimed is not yet known, one also can't determine whether the absence of a row in T2 indicates that an employee doesn't claim dependents or just that the number of dependents claimed is not yet known.

What troubles me most about eliminating nulls just for the sake of eliminating nulls is that there isn't always a precise mapping between a database scheme that permits nulls and one that doesn't. Information is lost, such as the fact that each employee claims some number of dependents in the example above.

> --
> Roy
>
>
Received on Sun Nov 22 2009 - 09:26:23 CET

Original text of this message