Re: Interpretation of Relations

From: Joe Thurbon <usenet_at_thurbon.com>
Date: Mon, 22 Jan 2007 10:30:30 GMT
Message-ID: <2007012220301150878-usenet_at_thurboncom>

On 2007-01-22 10:46:47 +1000, "JOG" <jog_at_cs.nott.ac.uk> said:
>
> This is extremely close to my line of research. As such this seems like
> a good opportunity to dig out something very similar, that started my
> line of thought in this direction. I had been attempting to look at the
> consequences of different encoding strategies for stating NL

NL is natural language?

> sentences
> as formal propositions, and the effects that the choices made have on
> the issue of missing information within the resulting data model. In
> the course of this I produced the following simple example of the
> effects of CWA and missing information that concerned me (I have
> reworked the example to correspond to the OP).

I think we're thinking along similar lines. I am still really new with the RM side, so I'm going to ask what might be pretty basic questions, and make what might be basic observations. Hopefully my understanding of the logic side might be able to pay you back (although I'm not really an expert there, just a lot more experienced than I am in the RM).

>
> * Consider the dual predicates Joes_hair(x) and Not_Joes_hair(x), and
> an RM representation of them with a trivial domain:

>
> Domain D_Hair = {Red}
> Relation R_Joes_Hair = <value: D_Hair>
> Relation R_Not_Joes_Hair = <value: D_Hair>
>
> * Constraints:
> C1 = FORALL x R_Joes_hair(x) <-> ~R_Not_Joes_hair(x)
> C2 = FORALL x R_Not_Joes_hair(x) <-> ~R_Joes_hair(x)

Is it possible to have these sorts of constraints in the RM? I thought that there was no 'inferencing' that went on inside the model.

What does ~R_Joes_hair(x) mean: just that x does not appear in the body of R_Joes_hair?

And finally, is this the standard way of handling predicates where you want to assert 'negative facts?'

>
> * We obviously know from our encoding that:
> R_Joes_hair(value : x) -> Joes_hair(x)
> R_Joes_hair(value : x) -> Not_Joes_hair(x)
>
> * Also, by the CWA we know that:
> ~ R_Joes_hair(value : x) -> ~Joes_hair(x)
> ~ R_Not_Joes_hair(value : x) -> ~Not_Joes_hair(x)

Just to check that I'm with you here, Joes_hair is is logical predicate and R_Joes_hair is a relation?

I had written down a couple of formula somewhere which I think captures this notion. I'll put it at the bottom of this post, but yes, assuming that ~R_Joes_Hair(x) just means that x does not appear in R_Joes_Hair's body, I think this is the right way to interpret this logically.

>
> * Now, if Joe's hair is red one should encode:
> R_Joes_Hair = { (value:Red) }
> R_Not_Joes_Hair = { }
>
> * Or if he does not have red hair one encodes:
> R_Joes_Hair = { }
> R_Not_Joes_Hair = { (value:Red) }
>
> * However, I don't know Joe, so this information is missing, and this
> puts me in rather a spot. I cannot state R_Joes_Hair(Red) or
> R_Not_Joes_Hair(Red) because:
> Joes_Hair(Red) = UNKNOWN
> Not_Joes_Hair(Red) = UNKNOWN
>
> * But worse still, if I do nothing at all and insert no tuples I have:
> R_Joes_Hair = { }
> R_Not_Joes_Hair = { }
>
> * From CWA from this we could infer:
> ~Joes_hair(Red) ^ ~Not_Joes_Hair(Red)
> => ~Joes_hair(Red) ^ Joes_Hair(Red)
> => CONTRADICTION
Yes. Right.

>
> This frustrated me somewhat when I first jotted it down, and even if it
> is missing a trick, it has given me some useful insights into how the
> issue might be addressed through a description of 'facts /about/ our
> knowledge of the world' (as you put it) via a SOL formalization - I'm
> not sure that modal logic is necessary in the db-algebra itself.

I agree. I think that the last set of assertions made in your example (where you derive the contradiction using the CWA) is already outside the RM. And if it is, my intuition is that really the root of all the problems with missing information is using the CWA to induce negative information. It ends up with your logical interpretation being completely asymmetric. In particular, you need to have these 'special constraints' to infer a relationship between Hair_Colour and Not_Hair_Colour, whereas Hair_Color and ~Hair_Colour are logical complements 'for free'.

So, what are the alternatives?

I think there are quite a few.

For example, we could extend the notion of relation to include two bodies. (I don't think that this is the right way to go about it in general, but it's a starting point)

For example,

Domain D_People = {Joe}
Domain D_Hair = {Red, Blond, Black}

Relation R_Hair Colour = <<D_People X D_Hair>: {{Joe, Blond}}: {{Joe:Red}}>

would indicate that

Joes hair is blond (it is in the 'positive' body) Joes hair is not red (it is in the 'negative' body) Whether Joe's hair is black or not is unknown.

Now, I am aware that this approach, followed purely as stated, would be completely intractable in practice. For example, many domains are just to big to enumerate all the possibilities. But at least it gets rid of the CWA.

Another strategy would be to slightly weaken the CWA in some circumstances. More below.

> However I am a long way from being a logician and as such have a
> healthy skepticism of the validity of absolutely any maths I generate,
> so any critical analysis is /more/ than welcome.

It all seemed to be right to me. A more general characterization might be like this:

(The below is for unary relations over finite domains, because usenet is a plain text medium and the notation is difficult enough already, but the n-ary case is the natural extension):

A relation R is defined over a Domain D = <d1, ..., dn>, with a Body B, containing 0 or more 1-ary tuples, each tuple containing one element of D.

We then define a logical predicate L, also defined over D, whose truth value for each di in D is defined as

L(di) iff "di is an element of B", where B the body of R.

The "iff" effectively closes the predicate over D, so the closed world assumption happens at the logic level, rather than the algebraic level.

So far, this is identical to your example above. One 'workaround' to the missing problem is:

For concepts which are 'facts about the world' L(di) is just a logical assertion.

For concepts which are 'facts about our knowledge about the world' I'd use modal logic (more precisely something like epistemic logic) and say that L(di) should actually be thought of as

K(L(di))

where K is the model operator 'Known'.

Why is this interesting (well, at least I think it is)?