Re: Nulls, integrity, the closed world assumption and events

From: JOG <jog_at_cs.nott.ac.uk>
Date: 8 Jan 2007 16:17:04 -0800
Message-ID: <1168301824.859993.67190_at_v33g2000cwv.googlegroups.com>


David wrote:

> JOG wrote:
> > David wrote:
> >
> > > Consider the following relation
> > >
> > > person(P,M,F) :- person P has mother M, father F.
> > >
> > > Suppose M,F are non-nullable foreign keys with enforced referential
> > > integrity back into the person relation. By induction a non-empty
> > > database would have to be infinite.
> > >
> > > One possible solution is to allow M,F to be null. This proposal is at
> > > odds with the purity of the predicate calculus.
> >
> > Would a more elegant and correct solution not be to have two relations:
> >
> > person(P, Sex)
> > parentage(P, M, F)
> >
> > where P is the candidate key of parentage, but where P, M and F have
> > enforced referential integrity (with a check constraint on sex) back to
> > the person relation?
>
> This is similar to the preferred solution in my original post. The
> birth event predicate allows only a subset of the persons to have their
> mother and father specified.
>
> > I'd recommend forgetting the database structuring when initially
> > thinking about the problem anyway and focus on the propositions you are
> > trying to model - at some point you will have _at least_ two statement
> > about people with no parents, indicating you have a proposition that
> > does not fit into the 'parentage structuring', and hence has no place
> > being there. This in turn indicates one should have a separate person
> > relation.
>
> Agreed.
>
> > Exists:P1 with sex:F
> > Exists:P2 with sex:M
> > Exists:P3 with sex:M with Mother:P1 and Father:P2
> > Exists:P4 with sex:F with Mother:P1 and Father:P2
> > etc...
> >
> > Clearly there two types of propositions here.
>
> It is clear after thinking about the closed world assumption. To the
> naive the claim that every person has a mother and father seems
> correct.

If we don't know the father of a person, then what can we say about that? One thing:

"Person P has a father, but who that father is, is missing."

That is the proposition that should be recorded. If we don't even know if the person had a father at all (queue examples of virgin births, or cloned sheep) then we can't say anything whatsoever. Hence when we ask a database a question, we are asking it what it knows, not about the underlying nature of the real world from which its propositions are recorded. To mistake this is to misunderstand the nature of a database.

Further I would go further in agreement with J M Davitts post that it is likely that we will be looking at propositions where sometimes a person's Mother or Father is unrecorded, giving us together:

Exists:P1 with sex:F
Exists:P2 with sex:M
Exists:P3 with sex:M with Mother:P1 and Father:P2
Exists:P4 with sex:F with Mother:P1 and Father:P2
Exists:P5 with sex:F with Mother:P4

Exists:P1 who has a Missing_Role:Father and Missing_Role:Mother Exists:P2 who has a Missing_Role:Father and Missing_Role:Mother

Again this indicates more another proposition structures, and so more relations are required, or a reworking of the propositions predication. Either way I cannot emphasise enough how useful I have found writing down potential propositions to work out their commonalities before designing the storing base relations themselves.

>
> What do you think of the idea to favour direct representation of events
> in a RDB? It is my impression that this tends to lead to normalised
> designs that properly deal with the closed world assumption, avoid
> nulls, ensure simple updates, and makes it easy to think about strong
> integrity constraints.

I agree with the principles of your viewpoint - but I would not call term it an 'event' necessarily. Codd called it a "relationship", although he still had to fit the concept into the relations he was already using in his 1969 paper. (Perhaps a better word would be 'observation'?) Either way a term that had no semantic link to the attributes of an entity would be optimal. I'd heartily recommend you have a look at Obejct Role Modelling (a competitor to E/R modelling), as it has the same philosophy that I believe you share underpinning it.

>
> A database used by a company as part of its running process had a
> beginning, and is updated as events happen. The idea to directly
> store the events seems very natural. The closed world assumption
> relates in part to a single interval on the time axis.
>
> > > Another solution is to drop the enforced referential integrity
> > > constraint. However it seems rather suspicious to pretend that some
> > > parent is not a person (in the DB) even though they are mentioned in
> > > the DB.
> > >
> > > A third solution is to regard the above person relation as bad because
> > > it is at odds with the closed world assumption. Instead, it is better
> > > to limit a person relation to something like
> > >
> > > person(P) :- P is a person
> > >
> > > and use other relations to represent the family tree, such as
> > >
> > > mother(M,C) :- M is the mother of child C
> > > father(F,C) :- F is the father of child C
> > >
> > > Note that as it stands we have quite weak integrity constraints because
> > > a person may have any number of mothers and fathers.
> > >
> > > Alternatively we could represent birth events
> > >
> > > birth(L,T,P,M,F) :- Person P was born to M,F at location L at time
> > > T
> > >
> > > This could be keyed on attribute P, ensuring that each person can be
> > > born at most once and therefore have at most one mother, one father,
> > > one birthplace and one age.
> > >
> > > Interestingly (and IMO not surprisingly), representing the underlying
> > > events that occur in space and time offers a good trade-off in terms of
> > > integrity constraints, and fits in well with the closed world
> > > assumption.
> > >
> > > Another perspective on this: relations represent facts not entities.
> > > The whole idea of RM is to represent information *about* entities using
> > > predicates. The idea that a record in a table represents an object
> > > has more to do with the OO approach.
> > >
> > > Consider that we store marriage information in a person relation
> > >
> > > Person(P,S) :-Person P has spouse S.
> > >
> > > Clearly the spouse attribute would need to be nullable. It would be
> > > better to store marriages in a separate relation, such as
> > >
> > > married(P1,P2) :- P1 and P2 are married
> > >
> > > But people can get married and divorced multiple times. To represent
> > > this it may be better to store the information using events. Eg
> > >
> > > wedding(L,T,P1,P2) :- P1 married P2 at location L at time T.
> > >
> > > There are some nice features about using events for relational models.
> > >
> > > 1. Events are immutable.
> > > 2. Events give us history
> > > 3. The relationships between entities can vary over time.
> > > 4. Events occur in space and time and therefore align well with some
> > > form of closed world assumption that is localized in space/time.
> > >
> > > However there is an increased computational burden if the current set
> > > of relationships have to be calculated from the events. This is a
> > > caching issue, such as when a bank caches an account balance.
Received on Tue Jan 09 2007 - 01:17:04 CET

Original text of this message