Re: NULLs

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 31 Dec 2007 09:51:35 -0500
Message-ID: <Wl7ej.2371$se5.1505_at_nlpi069.nbdc.sbc.com>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:b844301b-bcdd-4098-8c73-08ce49165238_at_w56g2000hsf.googlegroups.com...

> On Dec 31, 8:32 am, "Brian Selzer" <br..._at_selzer-software.com> wrote:

>> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>
>> news:aaf9d91c-fb76-412c-9dbb-213a0f3f70ce_at_i12g2000prf.googlegroups.com...
>>
>>
>>
>> > On Dec 30, 10:46 pm, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>> >> "JOG" <j..._at_cs.nott.ac.uk> wrote in message
>>
>> >>news:c9ec9790-f950-4407-b1f8-e5b8f4d07e91_at_v4g2000hsf.googlegroups.com...
>>
>> >> > On Dec 30, 7:23 pm, "Brian Selzer" <br..._at_selzer-software.com>
>> >> > wrote:
>> >> >> "stevedtrm" <steved..._at_hotmail.com> wrote in message
>>
>> >> >>news:a3716365-7bc4-4a75-ab58-ee7ac957c299_at_n20g2000hsh.googlegroups.com...
>>
>> >> >> >> > If everyone is clear NULLS shouldn't be used, why the debate
>> >> >> >> > as
>> >> >> >> > to
>> >> >> >> > what
>> >> >> >> > to do about them ?
>>
>> >> >> >> Because SQL allows NULL and even promotes the idea that NULL
>> >> >> >> solves
>> >> >> >> some
>> >> >> >> problem instead of introducing many.
>>
>> >> >> > So everyone is agreed that NULLs shouldn't appear anywhere, and
>> >> >> > its
>> >> >> > just a matter of time before NULLS become a legacy problem and a
>> >> >> > relational language supercedes SQL?
>>
>> >> >> No, Steve, not everyone is agreed.
>>
>> >> >> > Are the two solutions I suggested before the widely accepted as
>> >> >> > resolutions to the two problems NULLs were introduced to
>> >> >> > eradicate?
>>
>> >> >> >>Missing information? Then surely there should just be no tuple?
>>
>> >> >> A row may contain a lot more information that would otherwise be
>> >> >> useful.
>> >> >> Would you tell a potential customer that you won't serve them
>> >> >> because
>> >> >> you
>> >> >> don't know everything about them? I think not! Haven't you been
>> >> >> asked
>> >> >> for
>> >> >> your e-mail address at a retailer? What would you think if the
>> >> >> cashier
>> >> >> told
>> >> >> you to get lost for refusing to supply it?
>>
>> >> >> >>To indicate that there can be no value? Why not a seperate table
>> >> >> >>with
>> >> >> >>a boolean value in the non-key column?
>>
>> >> >> In my opinion, null should only ever be used to indicate "There
>> >> >> should
>> >> >> be
>> >> >> a
>> >> >> value here, but it hasn't been supplied." That there should be a
>> >> >> value
>> >> >> or
>> >> >> that there must be a value are structural issues, and therefore
>> >> >> should
>> >> >> not
>> >> >> be subject to interpretation. "No value here" invites
>> >> >> interpretation,
>> >> >> because it begs the question, "Should there have even been a
>> >> >> value?"
>>
>> >> >> Structural information is lost by separating a nullable column into
>> >> >> its
>> >> >> own
>> >> >> table: the fact that there should always be a value, even if it
>> >> >> hasn't
>> >> >> been
>> >> >> supplied is lost.
>>
>> >> > I think this could be put better. It is not that there should always
>> >> > be a value, but rather that "a value exists in the real world".
>> >> > However I don't really see what is structurally lost by complete
>> >> > decomposition to eliminate nulls, that can't be remedied via
>> >> > constraints.
>>
>> >> If you complete the interpretation such that all missing values are
>> >> supplied, then the resulting database should be a set of relations.
>> >> When
>> >> there is a nullable column on a table, there will a 1:1 correspondence
>> >> between values in the completed relation. The problem with separating
>> >> a
>> >> column that can contain nulls into another table is that when all
>> >> missing
>> >> values are supplied, there is no requirement that for every tuple in
>> >> the
>> >> referenced relation there must be one and only one tuple in the
>> >> referencing
>> >> relation. Adding a constraint to guarantee that 1:1 correspondence
>> >> would
>> >> be
>> >> the same as not permitting the information to be missing in the first
>> >> place.
>> >> Thus the only ways that I can see to remedy that would be either to
>> >> add a
>> >> boolean column, or to add another table.
>>
>> > Could you clarify why you think using nulls maintain structural
>> > information? If one is saying that a datum /must/ exist as part of the
>> > proposition, why is there a nullable column allowing it not to?
>> > Perhaps you are saying that the nullable column is stating that the
>> > value "should be" part of the proposition, but "we're being a bit lax
>> > about it" whereas a decomposed version does not?
>>
>> > If so, I am unclear at the moment what the value of this is - perhaps
>> > you could refer me to an example of when it would be necessary?
>>
>> There is a functional relationship between a key and each non-key column.
>> If null can only indicate that there should be a value but it hasn't been
>> supplied, then the presence of a null doesn't alter that functional
>> relationship. In other words, for each particular key value, there is
>> one
>> and only one value for each non-key column--even if it is unclear what
>> that
>> value is. In a perfect world, there is no missing information, and a
>> table
>> that permits nulls becomes a relation that doesn't. In such a relation,
>> the
>> relationship I'm referring to would be called a functional dependency.
>> By
>> separating a nullable column into its own table, that functional
>> relationship is lost.
>
> But by allowing a column to be nullable aren't you saying that the
> "functional relationship" wasn't valid in the first place.
>

No. The functional relationship is still valid: you know that for every key value there is one and only one non-key value for each non-key column, and you know that the non-key value belongs to the column's domain. In other words, you know that there are mappings between key values and members of each non-key column's domain, but in the imperfect world that allows null, you just don't know which mapping it is.

>> Just as is the case when a relation R {A, B, C} where
>> A is the key is separated into a pair of relations, S {A, B} and T {A, C}
>> such that T[A] IN S[A], since the values for A, B and C no longer appear
>> in
>> the same tuple, it is now possible for there to be a value for A and B
>> without a value for C: the functional relationship between A and C is
>> lost.

>
> I appreciate the argument for requiring extra constraints to maintain
> all information after decomposition under full info. However, if you
> are allowing nulls then surely you are already acceding that those
> constraints can be violated.
>

Not if null can only indicate, "there should be a value here, but it hasn't been supplied."

If you split a table with a column that allows nulls into a set of relations that don't, you need to add something that indicates whether or not there should be a value. For example, suppose you have a table, R {A, B, C} where A is the key and C allows nulls. You can split it up two ways, the first,

R1 {A, B, C!} and R2 {A, C} such that R2[A] IN R1[A] and where C! draws its values from the boolean domain In this case the value for C! in R1 indicates whether there can be a value for C. Without C!, it cannot be determined whether there should have been a value for C but it hasn't been supplied. (Of course, you could overload the interpretation of R1 to mean also that there can always be a value for C--indeed if the values of C! are universally true, then it makes sense to do it, but clearly what was a matter of determination is now a matter of interpretation.)

Another option would be the set of relations,

R3 {A, B}, R4 {A} and R5 {A, C} such that R5[A] IN R4[A] and R4[A] IN R3[A].

In this case tuples in R4 indicate whether there can be a value for C. Without R4, as above it cannot be determined whether there should have been a value for C but it hasn't been supplied.

In the table, R, however, the fact that each C appears in the same row with values for A and B indicates that there can always be a value for C, even if it hasn't been supplied.

>>
>>
>>
>> >> >> That's not to say that it isn't possible to eliminate
>> >> >> nulls altoghether, it's just that you need several separate tables
>> >> >> instead
>> >> >> of one: one table for the values that must always be present, one
>> >> >> table
>> >> >> for
>> >> >> each nullable column (or set of columns in those cases where
>> >> >> whenever
>> >> >> one
>> >> >> value is supplied, another value must also be supplied) that
>> >> >> indicates
>> >> >> whether or not there should be a value (or set of values), and one
>> >> >> table
>> >> >> with a column (or set of columns) for those values that just happen
>> >> >> to
>> >> >> have
>> >> >> been supplied.
>>
>> >> >> Note also that the spectre of 3VL disappears when null indicates
>> >> >> only
>> >> >> "there
>> >> >> should be a value here, but it hasn't been supplied." Since there
>> >> >> should
>> >> >> be
>> >> >> a value, and only one value, a null can simply be replaced by an
>> >> >> exclusive
>> >> >> disjunction that encompasses all of the values in the attribute's
>> >> >> domain.
>> >> >> So there is no need for 3VL, 2VL suffices. (For infinite domains,
>> >> >> a
>> >> >> quantifier could be used that means instead of "there is...,"
>> >> >> "there
>> >> >> is
>> >> >> exactly one....")
> Received on Mon Dec 31 2007 - 15:51:35 CET

Original text of this message