Re: boolean datatype ... wtf?
Date: Wed, 06 Oct 2010 23:47:56 +0200
Message-ID: <9gnpa69fo7r8qjijurommgqdna5svo7vtg_at_4ax.com>
On Wed, 6 Oct 2010 11:14:17 -0700 (PDT), Brian wrote:
>I think you misunderstand what I'm saying: whenever there is no value,
No I did not misunderstand. Both "there is not supposed to be a value
here" and "there should be a value here but it has not been supplied"
are fine examples of reasons why a value is missing in the database
population. Note that with "missing", I do not mean that there should be
a value but isn't, but just there is no value. (Not being a native
English speaker, I don't know which association is prevalent when
English readers read "missing").
My point is that this distinction is not always relevant. I'll use an
example to explain. Let's say that the Customers table has a nullable
column "Gender", and a CHECK constraint that allows only the values Male
and Female. A NULL in this column can have two causes - either the
customer is not a natural person but a legal person (there is not
supposed to be a value for its gender), or the customer is a natural
person but never specified his/her gender (there is supposed to be a
value but it hasn't been supplied).
Many documents I read on or linked from sites such as dbdebunk.com or
thethirdmanifesto.com make this observation and then assume that,
because there are multiple situations that lead to a missing, one HAS to
somehow distinguish between those situations. But that need not be
necessary; that depends on the business. Only the domain expert can tell
us if, for a specific application, this distinction is important.
If it is, then this has to be modeled as an extra predicate. (Okay, in
this specific example, the extra predicate may not be required, as it
can be derived from the predicate that distinquishes natural persons
from legal persons - if such a predicate even exists in the model. But
in other situations, the extra predicate would be required.) That extra
predicate also has to allow for missing values (as there is no value
applicable for customers where the gender is recorded).
> In other words, either the
My background is NIAM - the Dutch counterpart of ORM (Object Role
Modeling). NIAM centers around Fact Types and their "readings". These
concepts are very similar to predicats, but there might be subtle
differences I am unaware of.
>either there isn't supposed to be a value, or there is supposed to be
>a value but it hasn't been supplied.
>predicate of the atomic formula that expresses the assertion does not
>have a place for an individual term that corresponds to the nullable
>column name, or there is a place but an individual variable instead of
>an individual constant occupies that place.
In NIAM, a missing fact can not be read. Regardless of why the fact is
missing. For the Fact Type gender, the reading form would be Customer
"The customer identified by customernumber <CustomerNumber> has the
<Gender> gender". For reading an individual fact, you would replace
<CustomerNumber> with the identifier of a customer, and <Gender> with
his or her gender, to get readings such as "The customer identified by
customernumber 125 has the male gender",
Your text above suggests (to me) that predicates in the relational model accept variables as well as values, so that a predicate that is similar to the reading form I used above could be instantiated to read "The customer identified by customernumber 222 has a *variable* gender". If that is a correct interpretation, it sheds a new light on many things I have read. Please confirm if I am not misinterpretating you.
> Why there is a variable
>instead of a constant in that place doesn't change the fact that the
>predicate still has a place for that term. So whether there is
>supposed to be a value determines the atom's predicate.
> It follows
>that since it can't be determined just from the absence of a value
>whether there is supposed to be a value, it must be explicitly stated
>that there is or is not supposed to be a value.
If, and only if, that distinction is relevant to the business. Otherwise you simply don't record the value and you're done.
> If you're only a null
>hater, then only two relvars are needed: one with an additional
>boolean attribute that specifies for each tuple whether there should
>be a value or not, and another relvar for the values that have been
>supplied. If you're also a boolean hater, then the relvar with the
>boolean attribute would have to be split as well.
I still don't see the leap here, nor the connection with hating or not
hating booleans.
If the business is not interested in recording is a value is absent
because there should not be a value or because it just happens to not be
there, then you need either one nullable column, or one extra relvar,
with no booleans either way.
And if the business does require you to record this distinction than you
either need two nullable columns, or two extra relvars. Again, with no
booleans in either situation.
You imply that it's possible to track both values and (for missing values) whether a missing value is applicable or not, in just two relvars with no nulls, but with booleans. I don't see how those relvars would be designed.
Best, Hugo Received on Wed Oct 06 2010 - 23:47:56 CEST