Re: Interpretation of Relations

From: JOG <jog_at_cs.nott.ac.uk>
Date: 22 Jan 2007 09:13:38 -0800
Message-ID: <1169486017.555930.269660_at_l53g2000cwa.googlegroups.com>


Joe Thurbon wrote:

> On 2007-01-22 10:46:47 +1000, "JOG" <jog_at_cs.nott.ac.uk> said:
> >
> > This is extremely close to my line of research. As such this seems like
> > a good opportunity to dig out something very similar, that started my
> > line of thought in this direction. I had been attempting to look at the
> > consequences of different encoding strategies for stating NL
>
> NL is natural language?
>
> > sentences
> > as formal propositions, and the effects that the choices made have on
> > the issue of missing information within the resulting data model. In
> > the course of this I produced the following simple example of the
> > effects of CWA and missing information that concerned me (I have
> > reworked the example to correspond to the OP).
>
> I think we're thinking along similar lines. I am still really new with
> the RM side, so I'm going to ask what might be pretty basic questions,
> and make what might be basic observations. Hopefully my understanding
> of the logic side might be able to pay you back (although I'm not
> really an expert there, just a lot more experienced than I am in the
> RM).
>
> >
> > * Consider the dual predicates Joes_hair(x) and Not_Joes_hair(x), and
> > an RM representation of them with a trivial domain:
>
>
> >
> > Domain D_Hair = {Red}
> > Relation R_Joes_Hair = <value: D_Hair>
> > Relation R_Not_Joes_Hair = <value: D_Hair>
> >
> > * Constraints:
> > C1 = FORALL x R_Joes_hair(x) <-> ~R_Not_Joes_hair(x)
> > C2 = FORALL x R_Not_Joes_hair(x) <-> ~R_Joes_hair(x)
>
> Is it possible to have these sorts of constraints in the RM? I thought
> that there was no 'inferencing' that went on inside the model.
>
> What does ~R_Joes_hair(x) mean: just that x does not appear in the body
> of R_Joes_hair?
>
> And finally, is this the standard way of handling predicates where you
> want to assert 'negative facts?'
>
> >
> > * We obviously know from our encoding that:
> > R_Joes_hair(value : x) -> Joes_hair(x)
> > R_Joes_hair(value : x) -> Not_Joes_hair(x)
> >
> > * Also, by the CWA we know that:
> > ~ R_Joes_hair(value : x) -> ~Joes_hair(x)
> > ~ R_Not_Joes_hair(value : x) -> ~Not_Joes_hair(x)
>
> Just to check that I'm with you here, Joes_hair is is logical predicate
> and R_Joes_hair is a relation?
>
> I had written down a couple of formula somewhere which I think captures
> this notion. I'll put it at the bottom of this post, but yes, assuming
> that ~R_Joes_Hair(x) just means that x does not appear in R_Joes_Hair's
> body, I think this is the right way to interpret this logically.
>
>
> >
> > * Now, if Joe's hair is red one should encode:
> > R_Joes_Hair = { (value:Red) }
> > R_Not_Joes_Hair = { }
> >
> > * Or if he does not have red hair one encodes:
> > R_Joes_Hair = { }
> > R_Not_Joes_Hair = { (value:Red) }
> >
> > * However, I don't know Joe, so this information is missing, and this
> > puts me in rather a spot. I cannot state R_Joes_Hair(Red) or
> > R_Not_Joes_Hair(Red) because:
> > Joes_Hair(Red) = UNKNOWN
> > Not_Joes_Hair(Red) = UNKNOWN
> >
> > * But worse still, if I do nothing at all and insert no tuples I have:
> > R_Joes_Hair = { }
> > R_Not_Joes_Hair = { }
> >
> > * From CWA from this we could infer:
> > ~Joes_hair(Red) ^ ~Not_Joes_Hair(Red)
> > => ~Joes_hair(Red) ^ Joes_Hair(Red)
> > => CONTRADICTION
>
> Yes. Right.
>
> >
> > This frustrated me somewhat when I first jotted it down, and even if it
> > is missing a trick, it has given me some useful insights into how the
> > issue might be addressed through a description of 'facts /about/ our
> > knowledge of the world' (as you put it) via a SOL formalization - I'm
> > not sure that modal logic is necessary in the db-algebra itself.
>
> I agree. I think that the last set of assertions made in your example
> (where you derive the contradiction using the CWA) is already outside
> the RM. And if it is, my intuition is that really the root of all the
> problems with missing information is using the CWA to induce negative
> information. It ends up with your logical interpretation being
> completely asymmetric. In particular, you need to have these 'special
> constraints' to infer a relationship between Hair_Colour and
> Not_Hair_Colour, whereas Hair_Color and ~Hair_Colour are logical
> complements 'for free'.
>
> So, what are the alternatives?
>
> I think there are quite a few.
>
> For example, we could extend the notion of relation to include two
> bodies. (I don't think that this is the right way to go about it in
> general, but it's a starting point)
>
> For example,
>
> Domain D_People = {Joe}
> Domain D_Hair = {Red, Blond, Black}
>
> Relation R_Hair Colour = <<D_People X D_Hair>: {{Joe, Blond}}: {{Joe:Red}}>
>
> would indicate that
>
> Joes hair is blond (it is in the 'positive' body)
> Joes hair is not red (it is in the 'negative' body)
> Whether Joe's hair is black or not is unknown.
>
> Now, I am aware that this approach, followed purely as stated, would be
> completely intractable in practice. For example, many domains are just
> to big to enumerate all the possibilities. But at least it gets rid of
> the CWA.
>
> Another strategy would be to slightly weaken the CWA in some
> circumstances. More below.
>
> > However I am a long way from being a logician and as such have a
> > healthy skepticism of the validity of absolutely any maths I generate,
> > so any critical analysis is /more/ than welcome.
>
> It all seemed to be right to me. A more general characterization might
> be like this:
>
> (The below is for unary relations over finite domains, because usenet
> is a plain text medium and the notation is difficult enough already,
> but the n-ary case is the natural extension):
>
> A relation R is defined over a Domain D = <d1, ..., dn>, with a Body B,
> containing 0 or more 1-ary tuples, each tuple containing one element of
> D.
>
> We then define a logical predicate L, also defined over D, whose truth
> value for each di in D is defined as
>
> L(di) iff "di is an element of B", where B the body of R.
>
> The "iff" effectively closes the predicate over D, so the closed world
> assumption happens at the logic level, rather than the algebraic level.
>
> So far, this is identical to your example above. One 'workaround' to
> the missing problem is:
>
> For concepts which are 'facts about the world' L(di) is just a logical
> assertion.
>
> For concepts which are 'facts about our knowledge about the world' I'd
> use modal logic (more precisely something like epistemic logic) and say
> that L(di) should actually be thought of as
>
> K(L(di))
>
> where K is the model operator 'Known'.
>
> Why is this interesting (well, at least I think it is)?
>
> K(L(di)) -> L(di))
>
> That is, all the stuff that you "know" is true.
>
> but, and this is important
>
> ~(K(L(di)) does not entail ~L(di)
>
> That is, just because you don't "know" something, you don't necessarily
> know that it's true.
>
> This gives a nice consistent interpretation of missing information,
> keeps the CWA around, and doesn't change the meaning of the 'positive
> examples' in the body of R. Of course, sometimes, you _do_ want to
> entail ~L(d) from d missing from R's body. In this case, you just
> choose your L's interpretation to be a standard logical one.
>
> A few posts ago, you made a closing remark that query results are 'as
> far as the DB knows'. The idea that some relations should be
> interpreted modally allows a model to make that notion explicitly and
> selectively. Note that none of this modal treatment actually effects
> the relation model, just what happens when you start trying to do some
> inference with the facts as stated in relations.
>
> What I'm trying to get a handle on now is whether it is (a) correct,
> (b) useful, and (c) can it be incorporated into the model. For example,
> you might consider how a 'known' and a 'fact' relations behave under
> joins. Another consideration is, how many of these modal operators are
> needed? Possibly as many as there are different faces of NULL?
> Actually, before I get to (c), I'm not sure if it would be better to
> just leave it out of the RM altogether, and keep it in the inferencing
> part of the 'system'.
>
> Anyway, I've rambled on quite a bit. The ideas are pretty new to me,
> still in development, and really, I'm getting ahead of myself because I
> still don't fully understand the RM. It's nice to see that someone has
> at least had a similar idea, too.
>
> Does any of this make any sense to you? To anyone?

Couple of things - relational encoding requires that atrributes are named, and this is an important consideration not to leave out of your syntax and analysis, as it has a significant impact on the mathematics of the model.

As bob pointed out the relational algebra allows one to generate new propositions from those already stated, so this is the mechanism in which inferencing is performed.

The idea of incorporating modal logic is very interesting, but I'd note that give modal logic is reducible to first order logic (I sure I have read this but don't grill me on it), there would be a response that RM is already capable of representing the desired results as is, and any extra layers provided to incorporate it would require careful thought indeed.

For what its worth, I believe the crucical point is that the data model should not be trying to model facts from the real world, but rather our knowledge of those facts. This sounds like drivel to start with, but it does have an impact on representation: Say we have a proposition from the real world, which has three roles x,y and z, and three corresponding values a,b and c. RM as it stands would represent this proposition directly as a tuple:

P(x:a, y:b, z:c)

whereas I believe a tuple should perhaps represent it 'indirectly' as a compound predicate:

Exists p x(p,a) ^ y(p,b) ^ z(p,b)

I believe the consequences of this subtle change in interpretation of what we are 'recording' (facts - or statements /about/ facts) _may_ be able to remove a lot of the logical errors generated by missing information, and perhaps some other issues too. But don't quote me on that.

Regards, Jim.

>
> Cheers,
> Joe
Received on Mon Jan 22 2007 - 18:13:38 CET

Original text of this message