Re: Interpretation of Relations

From: Joe Thurbon <usenet_at_thurbon.com>
Date: Mon, 22 Jan 2007 10:30:30 GMT
Message-ID: <2007012220301150878-usenet_at_thurboncom>


On 2007-01-22 10:46:47 +1000, "JOG" <jog_at_cs.nott.ac.uk> said:
>
> This is extremely close to my line of research. As such this seems like
> a good opportunity to dig out something very similar, that started my
> line of thought in this direction. I had been attempting to look at the
> consequences of different encoding strategies for stating NL

NL is natural language?

> sentences
> as formal propositions, and the effects that the choices made have on
> the issue of missing information within the resulting data model. In
> the course of this I produced the following simple example of the
> effects of CWA and missing information that concerned me (I have
> reworked the example to correspond to the OP).

I think we're thinking along similar lines. I am still really new with the RM side, so I'm going to ask what might be pretty basic questions, and make what might be basic observations. Hopefully my understanding of the logic side might be able to pay you back (although I'm not really an expert there, just a lot more experienced than I am in the RM).

>
> * Consider the dual predicates Joes_hair(x) and Not_Joes_hair(x), and
> an RM representation of them with a trivial domain:

>
> Domain D_Hair = {Red}
> Relation R_Joes_Hair = <value: D_Hair>
> Relation R_Not_Joes_Hair = <value: D_Hair>
>
> * Constraints:
> C1 = FORALL x R_Joes_hair(x) <-> ~R_Not_Joes_hair(x)
> C2 = FORALL x R_Not_Joes_hair(x) <-> ~R_Joes_hair(x)

Is it possible to have these sorts of constraints in the RM? I thought that there was no 'inferencing' that went on inside the model.

What does ~R_Joes_hair(x) mean: just that x does not appear in the body of R_Joes_hair?

And finally, is this the standard way of handling predicates where you want to assert 'negative facts?'

>
> * We obviously know from our encoding that:
> R_Joes_hair(value : x) -> Joes_hair(x)
> R_Joes_hair(value : x) -> Not_Joes_hair(x)
>
> * Also, by the CWA we know that:
> ~ R_Joes_hair(value : x) -> ~Joes_hair(x)
> ~ R_Not_Joes_hair(value : x) -> ~Not_Joes_hair(x)

Just to check that I'm with you here, Joes_hair is is logical predicate and R_Joes_hair is a relation?

I had written down a couple of formula somewhere which I think captures this notion. I'll put it at the bottom of this post, but yes, assuming that ~R_Joes_Hair(x) just means that x does not appear in R_Joes_Hair's body, I think this is the right way to interpret this logically.

>
> * Now, if Joe's hair is red one should encode:
> R_Joes_Hair = { (value:Red) }
> R_Not_Joes_Hair = { }
>
> * Or if he does not have red hair one encodes:
> R_Joes_Hair = { }
> R_Not_Joes_Hair = { (value:Red) }
>
> * However, I don't know Joe, so this information is missing, and this
> puts me in rather a spot. I cannot state R_Joes_Hair(Red) or
> R_Not_Joes_Hair(Red) because:
> Joes_Hair(Red) = UNKNOWN
> Not_Joes_Hair(Red) = UNKNOWN
>
> * But worse still, if I do nothing at all and insert no tuples I have:
> R_Joes_Hair = { }
> R_Not_Joes_Hair = { }
>
> * From CWA from this we could infer:
> ~Joes_hair(Red) ^ ~Not_Joes_Hair(Red)
> => ~Joes_hair(Red) ^ Joes_Hair(Red)
> => CONTRADICTION
Yes. Right.

>
> This frustrated me somewhat when I first jotted it down, and even if it
> is missing a trick, it has given me some useful insights into how the
> issue might be addressed through a description of 'facts /about/ our
> knowledge of the world' (as you put it) via a SOL formalization - I'm
> not sure that modal logic is necessary in the db-algebra itself.

I agree. I think that the last set of assertions made in your example (where you derive the contradiction using the CWA) is already outside the RM. And if it is, my intuition is that really the root of all the problems with missing information is using the CWA to induce negative information. It ends up with your logical interpretation being completely asymmetric. In particular, you need to have these 'special constraints' to infer a relationship between Hair_Colour and Not_Hair_Colour, whereas Hair_Color and ~Hair_Colour are logical complements 'for free'.

So, what are the alternatives?

I think there are quite a few.

For example, we could extend the notion of relation to include two bodies. (I don't think that this is the right way to go about it in general, but it's a starting point)

For example,

Domain D_People = {Joe}
Domain D_Hair = {Red, Blond, Black}

Relation R_Hair Colour = <<D_People X D_Hair>: {{Joe, Blond}}: {{Joe:Red}}>

would indicate that

Joes hair is blond (it is in the 'positive' body) Joes hair is not red (it is in the 'negative' body) Whether Joe's hair is black or not is unknown.

Now, I am aware that this approach, followed purely as stated, would be completely intractable in practice. For example, many domains are just to big to enumerate all the possibilities. But at least it gets rid of the CWA.

Another strategy would be to slightly weaken the CWA in some circumstances. More below.

> However I am a long way from being a logician and as such have a
> healthy skepticism of the validity of absolutely any maths I generate,
> so any critical analysis is /more/ than welcome.

It all seemed to be right to me. A more general characterization might be like this:

(The below is for unary relations over finite domains, because usenet is a plain text medium and the notation is difficult enough already, but the n-ary case is the natural extension):

A relation R is defined over a Domain D = <d1, ..., dn>, with a Body B, containing 0 or more 1-ary tuples, each tuple containing one element of D.

We then define a logical predicate L, also defined over D, whose truth value for each di in D is defined as

L(di) iff "di is an element of B", where B the body of R.

The "iff" effectively closes the predicate over D, so the closed world assumption happens at the logic level, rather than the algebraic level.

So far, this is identical to your example above. One 'workaround' to the missing problem is:

For concepts which are 'facts about the world' L(di) is just a logical assertion.

For concepts which are 'facts about our knowledge about the world' I'd use modal logic (more precisely something like epistemic logic) and say that L(di) should actually be thought of as

K(L(di))

where K is the model operator 'Known'.

Why is this interesting (well, at least I think it is)?

K(L(di)) -> L(di))

That is, all the stuff that you "know" is true.

but, and this is important

~(K(L(di)) does not entail ~L(di)

That is, just because you don't "know" something, you don't necessarily know that it's true.

This gives a nice consistent interpretation of missing information, keeps the CWA around, and doesn't change the meaning of the 'positive examples' in the body of R. Of course, sometimes, you _do_ want to entail ~L(d) from d missing from R's body. In this case, you just choose your L's interpretation to be a standard logical one.

A few posts ago, you made a closing remark that query results are 'as far as the DB knows'. The idea that some relations should be interpreted modally allows a model to make that notion explicitly and selectively. Note that none of this modal treatment actually effects the relation model, just what happens when you start trying to do some inference with the facts as stated in relations.

What I'm trying to get a handle on now is whether it is (a) correct, (b) useful, and (c) can it be incorporated into the model. For example, you might consider how a 'known' and a 'fact' relations behave under joins. Another consideration is, how many of these modal operators are needed? Possibly as many as there are different faces of NULL? Actually, before I get to (c), I'm not sure if it would be better to just leave it out of the RM altogether, and keep it in the inferencing part of the 'system'.

Anyway, I've rambled on quite a bit. The ideas are pretty new to me, still in development, and really, I'm getting ahead of myself because I still don't fully understand the RM. It's nice to see that someone has at least had a similar idea, too.

Does any of this make any sense to you? To anyone?

Cheers,
Joe Received on Mon Jan 22 2007 - 11:30:30 CET

Original text of this message