Re: RM formalism supporting partial information

From: David BL <davidbl_at_iinet.net.au>
Date: Sun, 25 Nov 2007 23:52:09 -0800 (PST)
Message-ID: <5054ea55-ae41-4ca7-bae1-36a9885f0f78_at_s12g2000prg.googlegroups.com>


On Nov 24, 3:56 am, Jan Hidders <hidd..._at_gmail.com> wrote:
> On 21 nov, 04:05, David BL <davi..._at_iinet.net.au> wrote:
>
> > By consistency I only mean the inability to derive self contradictions
> > that would make it impossible for the definitions to be satisfied in
> > the first place. Of course it is normally impossible to prove
> > consistency. I think this is quite different to your usage of
> > "consistency" in the above. Is that right?
>
> Yes. I meant that you are not consistently using or defining a single
> interpretation of missing data but rather seem to taking sometimes one
> and sometimes the other depending on what make the math looks cute.
> That approach is likely to end in elegant but meaningless results.

AFAIK you say this because of something to do with the CWA but I can't get past the feeling that the CWA is incompatible with partial information and that's all there is to it! Perhaps you could give me an example to back up that statement.

I have wondered whether what you're after is a model that allows the CWA to apply to some projections and not to others. Well I agree, but I still think it's interesting to investigate the simpler case of where the CWA is never applicable.

> > I would say that a model of partial information will have trouble
> > formalising consistency or completeness in the following sense: If we
> > imagine an omniscient, complete database then we can compare the
> > result of a query against this hypothetical database. Any reasonable
> > definition of relation difference will be judged as neither
> > necessarily consistent nor complete with respect to the hypothetical
> > database. Despite this we still would like a formalism that in
> > practice is useful for dealing with partial information. Perhaps as
> > well, we will use the formalism (with its well understood
> > mathematical properties) as a basis for our intuition rather than the
> > reverse!
>
> The thing is that you are doing it the wrong way around. You should
> first define in some formalism like logic what the data means (again,
> see Reiter for how this is done), and only *then* you should start
> thinking about the operators, which ones you want, and what they
> exactly mean.
>
> > Your reference to completeness above seems fairly easy to meet. The
> > algebra can be mapped to corresponding proof rules. In other words
> > given the algebra we can define the subset of FO formulae that can be
> > proven true (or false). I agree that this is the basis on which we
> > understand the meaning of the formal data model.
>
> I don't think that's what I said. When formalizing the meaning the
> present and missing data the algebra operators should not be in the
> picture.
>
> > I have wondered whether the interpretation should state that the CWA
> > applies everywhere and nulls are always interpreted as non-
> > existence. Under that formalism we have not as you say "left that
> > safe and well-understood realm" by allowing the value-unknown
> > interpretation. Note for example that it is on this formal basis
> > that a restricted form of negation can be defined. As far as the
> > formalism is concerned we stay strictly within the confines of a 2vl.
>
> Correct.
>
> > Practically speaking (ie in the presence of partial information) the
> > user should think of negation as failure to prove true.
>
> No, no, no. That's a very sloppy and confusing way of formulating a
> CWA. E.g., it might depend on the effectiveness of your theorem prover
> whether something is considered true or not.

Isn't the whole point that there isn't a CWA in the presence of partial information? Closed = Complete! What else could closed mean?

I might agree with your sentiment more generally for deductive databases, but for the restrictive logical inference used in RM, thinking of negation as failure to prove true seems simple and intuitive.

It seems to me either you define negation as failure to prove true, or disallow negation completely (in the presence of partial information).

> > We must somehow deal with the inconsistency between a formal
> > interpretation of null as non-existence, and the user of the DB that
> > may choose to *weaken* it to non-information. However, I think the
> > only issue is that the user must be careful to weaken the
> > interpretation of query results accordingly, and the way to do this is
> > fairly straightforward.
>
> The user is not the problem here, you are. :-) You first need to
> figure out what the options are that you want to offer to the user and
> understand their consequences. The best way to do that is by starting
> to formalize them properly, preferably in terms of a logical theory.
> Again, see Reiter for how this is done.
>
> > Of course this needs some justification. As an illustrative example
> > consider a conjunctive query compatible with select-project-join.
> > There is no negation, and the variables can be regarded as
> > existentially quantified. The result of the query will be a subset of
> > the result that would be returned from an omniscient database. This
> > could be regarded as consistency without completeness. What more
> > could one hope for in the presence of partial information?
>
> Exactly, so in that sense it is actually complete, and you can make
> that claim precise. The set of tupels in the answer will be exactly
> the set of tuples that are certain to be in the result of the same
> query over the omniscient database. By the nature of the problem every
> query should actually return 2 sets of tuples: the set of certain
> answers, and the set of possible answers. Your operators should
> therefore not operator on relations but on pairs of relations.

Firstly a minor nit pick: you can't say "possible answers", because they don't actually represent an upper bound on the result in the omniscient database.

You don't seem to be accounting for the idea that in my formalism you can interpret a given mv-relation r in different ways, using a project- -project...

    proj(X1, I( proj( X2, r) ) )

This gives you all those different result sets that you desire.

My intersection, union and join operators avoid throwing away any partial information that might be applicable to some subsequent project-interpret-project. Therefore the operators do in a sense operate on multiple conventional relations all at once. That is why they are interesting. Received on Mon Nov 26 2007 - 08:52:09 CET

Original text of this message