1NF, missing/unknown data, and structured values

From: <ralphbecket_at_gmail.com>
Date: 1 May 2006 18:13:57 -0700
Message-ID: <1146532437.707440.176740_at_i39g2000cwa.googlegroups.com>



My understanding is that a table is in 1NF if
- it is a set of rows (i.e., without duplicates)

Can someone confirm for me that there are no other constraints on column values under 1NF? In particular, is there any reason why column values should not themselves have structure? For example, dates seem like a perfectly good type for a column, but dates are structured values from which I can project day, month, or year.

If this is so, surely other structures such as lists and various user-defined data types should also be supported, provided they support tests for (in)equality as required by the RM (so higher order types are probably out). Admittedly you'd probably need a richer language than SQL to manipulate these structure values effectively.

Taking this idea further, we could go on to define a collection of "maybe" types (to borrow from functional programming parlance) to handle situations where we have missing or inapplicable data, without the need to break the RM logic with nonsense like NULLs.

That is, I could define a polymorphic type maybe_known(T) ranging over values of any type T, with two data constructors, known(X) and unknown, where X is a value of type T. For example, I could use known(age) as a column type to distinguish between people whose ages are known to the DB versus those that are not. If I want to, say, calculate the mean age, I must first filter out rows whose ages are unknown, then project the remaining ages from the known(age) values, then calculate their mean. In this way I am forced to ask "what is the mean age of those that are known?" rather than just "what is the mean age?" which my DB cannot answer.

The same idea can be extended further: how about a maybe_applicable(T) type with data constructors applicable(X) and not_applicable. I could even nest these types: the type
maybe_applicable(maybe_known(spouse_name)) covers
- whether spouses are applicable (e.g., children
can't be married)
- spouse names where applicable, but unknown

  • spouse names where applicable and known.

I believe such an approach avoids both the problems with NULLs (they break everything) and the problem of entirely forbidding structured data in columns (normalisation can lead to a large number of tables).

Question: is this already supported in SQL and the various products out there? If so, why would anyone still use NULLs? If it is not supported, what is the reason?

Ta,
-- Ralph Received on Tue May 02 2006 - 03:13:57 CEST

Original text of this message