Re: Love or hate, or? domains with cardinality two

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Tue, 3 Nov 2015 17:13:18 -0500
Message-Id: <20151103171318.49b0db80.jklowden_at_speakeasy.net>


On Mon, 2 Nov 2015 18:29:54 +0100
Nicola <nvitacolonna_at_gmail.com> wrote:

> Employee(id, dept, mgr?)
>
> where mgr? is boolean. A tuple (x,y,true) means that employee x
> belongs to department y and x is a manager of y. A tuple (x,y,false)
> means that employee x belongs to department y but x is not a manager
> of y.
>
> I was arguing about how schemas like that are commonly found in
> practice, and about how I feel uncomfortable with them, because, let
> apart the obvious difficulties with enforcing some integrity
> constraints (e.g., a department should have one and only one manager
> at a time) in current DBMSs, truth of a predicate is coded as a value
> of a domain. Suprisingly (for me), she did not dismiss the approach
> at once, but pointed out that you are trading a predicate (Manager
> (id)) for a couple of new constant symbols and the two things should
> be essentially interchangeable (from a logical point of view).
...
> But maybe my intuition is wrong. What do you think? Are there
> compelling reasons to avoid boolean domains in logical database
> design?

At a logical level, there's nothing to argue about, right? As your logician friend said, whether "true" is represented symbolically or derived from other facts doesn't much matter.

I think your intuition is nevertheless correct for two practical reasons.

  1. Because SQL has no operator for relational division, FORALL tests are hard to express, and therefore to enforce. That leads people to choose boolean columns over convoluted quantification queries. But unless the test is somehow otherwise implemented, the database is subject to becoming inconsistent. If you skip the boolean, you avoid that potential inconsistency.
  2. Very often, "2 is the wrong answer". What starts out as seemingly simple -- is manager or not, or loves/hates (terrible contrived example) -- turn out to have more attributes or states (perhaps the manager's title or managing relationship, or how every New Yorker I know feels about the city). Better to describe facts and infer truth, because that way as the facts become more complex the derivation of "is manager" can change with them.

My problem with the love/hates example (other than its artificiality) is that it seems a poor model for any domain of discourse I can imagine. In what realm do we want to know who are the lovers and haters? Are not the actors people, and their connections what is interesting?

I would tell the designer that he's likely elevated data to metadata, because to model new emotions requires a schema change. That's not *wrong*, because as you say the two models are logically interchangeable. It will prove inconvenient, though, if a dynamic aspect of the domain of discourse has been modelled as DDL.

--jkl Received on Tue Nov 03 2015 - 23:13:18 CET

Original text of this message