Re: Nulls, integrity, the closed world assumption and events

From: J M Davitt <jdavitt_at_aeneas.net>
Date: Mon, 08 Jan 2007 23:29:35 GMT
Message-ID: <ztAoh.9726$SJ3.2319_at_tornado.ohiordc.rr.com>


JOG wrote:

> David wrote:
> 
> 

>>Consider the following relation
>>
>> person(P,M,F) :- person P has mother M, father F.
>>
>>Suppose M,F are non-nullable foreign keys with enforced referential
>>integrity back into the person relation. By induction a non-empty
>>database would have to be infinite.
>>
>>One possible solution is to allow M,F to be null. This proposal is at
>>odds with the purity of the predicate calculus.
> 
> 
> Would a more elegant and correct solution not be to have two relations:
> 
> person(P, Sex)
> parentage(P, M, F)
> 
> where P is the candidate key of parentage, but where P, M and F have
> enforced referential integrity (with a check constraint on sex) back to
> the person relation?
> 
> I'd recommend forgetting the database structuring when initially
> thinking about the problem anyway and focus on the propositions you are
> trying to model - at some point you will have _at least_ two statement
> about people with no parents, indicating you have a proposition that
> does not fit into the 'parentage structuring', and hence has no place
> being there. This in turn indicates one should have a separate person
> relation.
> 
> Exists:P1 with sex:F
> Exists:P2 with sex:M
> Exists:P3 with sex:M with Mother:P1 and Father:P2
> Exists:P4 with sex:F with Mother:P1 and Father:P2
> etc...
> 
> Clearly there two types of propositions here.
> 

I would go a bit farther and "split" parentage; i.e., allow one to record knowledge of mother and father only when known. In other words, parentage should be decomposed. After all, I know folks who don't know who either parent is and others who know who only one parent is.
> 

>>Another solution is to drop the enforced referential integrity
>>constraint. However it seems rather suspicious to pretend that some
>>parent is not a person (in the DB) even though they are mentioned in
>>the DB.
>>
>>A third solution is to regard the above person relation as bad because
>>it is at odds with the closed world assumption. Instead, it is better
>>to limit a person relation to something like
>>
>> person(P) :- P is a person
>>
>>and use other relations to represent the family tree, such as
>>
>> mother(M,C) :- M is the mother of child C
>> father(F,C) :- F is the father of child C
>>
>>Note that as it stands we have quite weak integrity constraints because
>>a person may have any number of mothers and fathers.
>>
>>Alternatively we could represent birth events
>>
>> birth(L,T,P,M,F) :- Person P was born to M,F at location L at time
>>T
>>
>>This could be keyed on attribute P, ensuring that each person can be
>>born at most once and therefore have at most one mother, one father,
>>one birthplace and one age.
>>
>>Interestingly (and IMO not surprisingly), representing the underlying
>>events that occur in space and time offers a good trade-off in terms of
>>integrity constraints, and fits in well with the closed world
>>assumption.
>>
>>Another perspective on this: relations represent facts not entities.
>>The whole idea of RM is to represent information *about* entities using
>>predicates. The idea that a record in a table represents an object
>>has more to do with the OO approach.
>>
>>Consider that we store marriage information in a person relation
>>
>> Person(P,S) :-Person P has spouse S.
>>
>>Clearly the spouse attribute would need to be nullable. It would be
>>better to store marriages in a separate relation, such as
>>
>> married(P1,P2) :- P1 and P2 are married
>>
>>But people can get married and divorced multiple times. To represent
>>this it may be better to store the information using events. Eg
>>
>> wedding(L,T,P1,P2) :- P1 married P2 at location L at time T.
>>
>>There are some nice features about using events for relational models.
>>
>>1. Events are immutable.
>>2. Events give us history
>>3. The relationships between entities can vary over time.
>>4. Events occur in space and time and therefore align well with some
>>form of closed world assumption that is localized in space/time.
>>
>>However there is an increased computational burden if the current set
>>of relationships have to be calculated from the events. This is a
>>caching issue, such as when a bank caches an account balance.
>>
>>
>>David Barrett-Lennard
>
> Received on Tue Jan 09 2007 - 00:29:35 CET

Original text of this message