# Re: RM formalism supporting partial information

Date: Fri, 23 Nov 2007 10:56:16 -0800 (PST)

Message-ID: <198e02c2-ffb1-4e94-901f-95976fa3acaa_at_x69g2000hsx.googlegroups.com>

On 21 nov, 04:05, David BL <davi..._at_iinet.net.au> wrote:

> On Nov 19, 7:11 pm, Jan Hidders <hidd..._at_gmail.com> wrote:

*>
**>
**>
**> > On 19 nov, 05:20, David BL <davi..._at_iinet.net.au> wrote:
**>
**> > > On Nov 17, 7:54 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
**>
**> > > > On 17 nov, 04:35, David BL <davi..._at_iinet.net.au> wrote:
**> > > > > Also I think one could argue that the CWA is at odds with a model of
**> > > > > partial information, unless you keep in mind the idea of "negation as
**> > > > > failure to prove true".
**>
**> > > > Here we touch the core of why I think your approach is problematic.
**> > > > The CWA does apply for the does-not-apply interpretation of null
**> > > > values. It does however not apply in its classical form for the value-
**> > > > unknown interpretation. Actually Raymond Reiter himself (he introduced
**> > > > the CWA) explains very well how it then changes in a paper with the
**> > > > title "A sound and sometimes complete query evaluation algorithm for
**> > > > relational databases with null values".
**>
**> > > > What you seem to be doing is mixing the two interpretations, resulting
**> > > > in something that IMO doesn't really seem to have any consistently
**> > > > meaningful interpretation at all. If you really want to combine the
**> > > > two interpretations you need to first define the meaning of a relation
**> > > > with null values in terms of sets of "possible worlds" where the null
**> > > > values are removed by either giving a concrete value for them or
**> > > > declaring them not applicable. That might actually be quite
**> > > > interesting, and I don't remember seeing a paper that did this
**> > > > properly. Even Zaniolo doesn't really get this right, although he
**> > > > claims that he combines the two approaches. So you are in very good
**> > > > company. :-)
**>
**> > > The two main interpretations of null are apparently
**>
**> > > 1. value exists but is unknown
**> > > 2. value doesn't exist
**>
**> > > Zaniolo combines these into a single interpretation
**>
**> > > 3. no information
**>
**> > > I think you're saying 3 is a mixture of 1,2 and doesn't lead to a
**> > > consistent interpretation.
**>
**> > It can lead to a consistent interpretation, but I think you and
**> > Zaniolo don't do it in a consistent way.
**>
**> > > IMO the problem is actually the reverse.
**> > > It is easy to interpret "no information" (it simply means that the
**> > > predicate instantiation isn't available for logical deduction),
**> > > whereas interpretations 1,2 quickly take us outside the realm of the
**> > > RM/RA. I say that because it seems clear that the RM/RA is only
**> > > concerned with a strict subset of the FOPL involved with logical
**> > > deduction from sets of fully instantiated predicates.
**>
**> > Wow. There's so much I disagree with here that I'm not sure where to
**> > begin. To begin with a detail, the RM/RA naming convention hints for
**> > me at a deep misunderstanding. The algebra is not an integral part of
**> > the data model. If you take another query language it is still the
**> > relational model. If anything, FOL and the related calculi are more
**> > fundamental for understanding the meaning of the data.
**> > Next, you claim that your interpretation is simple, but your
**> > description of it is clearly not complete. A complete description
**> > tells you which FO sentences are true, false or neither. See Reiter in
**> > his paper on null values on how this is done properly for the value-
**> > unknown interpretation. Since you are combining it with another
**> > interpretation the result will be at least as complex as his. To begin
**> > with you will need to extend the language of FOL with extra atoms that
**> > allow places to be not there. So we have next to the classical atoms
**> > such as P(x, y, z) also P(x, y, _) (for simplicity let's take the
**> > unlabeled perspective) which says that the third value may be
**> > undefined. So tell me, given a languages of formulas with the usual
**> > logical connectives and quantifiers, over such atoms, which formulas
**> > are true, false and unknown given a certain relation with null values.
**> > Again, see Reiter for how this is done. Oh, and try to keep it
**> > simple. :-P
**>
**> > Finally, you say that it seems clear that the RM/RA is only concerned
**> > with a strict subset of the FOPL involved with logical deduction from
**> > sets of fully instantiated predicates. To the extent that this is
**> > true, you have already left that safe and well-understood realm the
**> > moment you allowed the value-unknown interpretation. Adding the not-
**> > applicable interpretation makes the situation worse, not better.
**>
**> Perhaps there is a philosophical difference in our approach to
**> mathematics!
*

Database theory is largely applied mathematics. You seem to be more inclined to the ways of pure mathematics. That's fine, of course, but I reserve the right to be a bit skeptical about the usefulness of your approach. :-)

> By consistency I only mean the inability to derive self contradictions

*> that would make it impossible for the definitions to be satisfied in
**> the first place. Of course it is normally impossible to prove
**> consistency. I think this is quite different to your usage of
**> "consistency" in the above. Is that right?
*

Yes. I meant that you are not consistently using or defining a single interpretation of missing data but rather seem to taking sometimes one and sometimes the other depending on what make the math looks cute. That approach is likely to end in elegant but meaningless results.

> I would say that a model of partial information will have trouble

*> formalising consistency or completeness in the following sense: If we
**> imagine an omniscient, complete database then we can compare the
**> result of a query against this hypothetical database. Any reasonable
**> definition of relation difference will be judged as neither
**> necessarily consistent nor complete with respect to the hypothetical
**> database. Despite this we still would like a formalism that in
**> practice is useful for dealing with partial information. Perhaps as
**> well, we will use the formalism (with its well understood
**> mathematical properties) as a basis for our intuition rather than the
**> reverse!
*

The thing is that you are doing it the wrong way around. You should first define in some formalism like logic what the data means (again, see Reiter for how this is done), and only *then* you should start thinking about the operators, which ones you want, and what they exactly mean.

> Your reference to completeness above seems fairly easy to meet. The

*> algebra can be mapped to corresponding proof rules. In other words
**> given the algebra we can define the subset of FO formulae that can be
**> proven true (or false). I agree that this is the basis on which we
**> understand the meaning of the formal data model.
*

I don't think that's what I said. When formalizing the meaning the present and missing data the algebra operators should not be in the picture.

> I have wondered whether the interpretation should state that the CWA

*> applies everywhere and nulls are always interpreted as non-
**> existence. Under that formalism we have not as you say "left that
**> safe and well-understood realm" by allowing the value-unknown
**> interpretation. Note for example that it is on this formal basis
**> that a restricted form of negation can be defined. As far as the
**> formalism is concerned we stay strictly within the confines of a 2vl.
*

Correct.

> Practically speaking (ie in the presence of partial information) the

*> user should think of negation as failure to prove true.
*

No, no, no. That's a very sloppy and confusing way of formulating a CWA. E.g., it might depend on the effectiveness of your theorem prover whether something is considered true or not.

> We must somehow deal with the inconsistency between a formal

*> interpretation of null as non-existence, and the user of the DB that
**> may choose to *weaken* it to non-information. However, I think the
**> only issue is that the user must be careful to weaken the
**> interpretation of query results accordingly, and the way to do this is
**> fairly straightforward.
*

The user is not the problem here, you are. :-) You first need to figure out what the options are that you want to offer to the user and understand their consequences. The best way to do that is by starting to formalize them properly, preferably in terms of a logical theory. Again, see Reiter for how this is done.

> Of course this needs some justification. As an illustrative example

*> consider a conjunctive query compatible with select-project-join.
**> There is no negation, and the variables can be regarded as
**> existentially quantified. The result of the query will be a subset of
**> the result that would be returned from an omniscient database. This
**> could be regarded as consistency without completeness. What more
**> could one hope for in the presence of partial information?
*

Exactly, so in that sense it is actually complete, and you can make that claim precise. The set of tupels in the answer will be exactly the set of tuples that are certain to be in the result of the same query over the omniscient database. By the nature of the problem every query should actually return 2 sets of tuples: the set of certain answers, and the set of possible answers. Your operators should therefore not operator on relations but on pairs of relations.

- Jan Hidders