Re: RM formalism supporting partial information

From: David BL <davidbl_at_iinet.net.au>
Date: Tue, 20 Nov 2007 19:05:02 -0800 (PST)
Message-ID: <971e0030-67ce-4882-9549-f6689cd1fee6_at_a28g2000hsc.googlegroups.com>


On Nov 19, 7:11 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
> On 19 nov, 05:20, David BL <davi..._at_iinet.net.au> wrote:
>
>
>
>
>
> > On Nov 17, 7:54 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > On 17 nov, 04:35, David BL <davi..._at_iinet.net.au> wrote:
> > > > Also I think one could argue that the CWA is at odds with a model of
> > > > partial information, unless you keep in mind the idea of "negation as
> > > > failure to prove true".
>
> > > Here we touch the core of why I think your approach is problematic.
> > > The CWA does apply for the does-not-apply interpretation of null
> > > values. It does however not apply in its classical form for the value-
> > > unknown interpretation. Actually Raymond Reiter himself (he introduced
> > > the CWA) explains very well how it then changes in a paper with the
> > > title "A sound and sometimes complete query evaluation algorithm for
> > > relational databases with null values".
>
> > > What you seem to be doing is mixing the two interpretations, resulting
> > > in something that IMO doesn't really seem to have any consistently
> > > meaningful interpretation at all. If you really want to combine the
> > > two interpretations you need to first define the meaning of a relation
> > > with null values in terms of sets of "possible worlds" where the null
> > > values are removed by either giving a concrete value for them or
> > > declaring them not applicable. That might actually be quite
> > > interesting, and I don't remember seeing a paper that did this
> > > properly. Even Zaniolo doesn't really get this right, although he
> > > claims that he combines the two approaches. So you are in very good
> > > company. :-)
>
> > The two main interpretations of null are apparently
>
> > 1. value exists but is unknown
> > 2. value doesn't exist
>
> > Zaniolo combines these into a single interpretation
>
> > 3. no information
>
> > I think you're saying 3 is a mixture of 1,2 and doesn't lead to a
> > consistent interpretation.
>
> It can lead to a consistent interpretation, but I think you and
> Zaniolo don't do it in a consistent way.
>
> > IMO the problem is actually the reverse.
> > It is easy to interpret "no information" (it simply means that the
> > predicate instantiation isn't available for logical deduction),
> > whereas interpretations 1,2 quickly take us outside the realm of the
> > RM/RA. I say that because it seems clear that the RM/RA is only
> > concerned with a strict subset of the FOPL involved with logical
> > deduction from sets of fully instantiated predicates.
>
> Wow. There's so much I disagree with here that I'm not sure where to
> begin. To begin with a detail, the RM/RA naming convention hints for
> me at a deep misunderstanding. The algebra is not an integral part of
> the data model. If you take another query language it is still the
> relational model. If anything, FOL and the related calculi are more
> fundamental for understanding the meaning of the data.

> Next, you claim that your interpretation is simple, but your
> description of it is clearly not complete. A complete description
> tells you which FO sentences are true, false or neither. See Reiter in
> his paper on null values on how this is done properly for the value-
> unknown interpretation. Since you are combining it with another
> interpretation the result will be at least as complex as his. To begin
> with you will need to extend the language of FOL with extra atoms that
> allow places to be not there. So we have next to the classical atoms
> such as P(x, y, z) also P(x, y, _) (for simplicity let's take the
> unlabeled perspective) which says that the third value may be
> undefined. So tell me, given a languages of formulas with the usual
> logical connectives and quantifiers, over such atoms, which formulas
> are true, false and unknown given a certain relation with null values.
> Again, see Reiter for how this is done. Oh, and try to keep it
> simple. :-P

>
> Finally, you say that it seems clear that the RM/RA is only concerned
> with a strict subset of the FOPL involved with logical deduction from
> sets of fully instantiated predicates. To the extent that this is
> true, you have already left that safe and well-understood realm the
> moment you allowed the value-unknown interpretation. Adding the not-
> applicable interpretation makes the situation worse, not better.

Perhaps there is a philosophical difference in our approach to mathematics! My tendency is to use some intuition to write down interesting definitions and then spend a lot of time investigating the properties (ie proving theorems). Success is judged by the following: 1) lots of rich theorems 2) independence from existing mathematical systems 3) consistency 4) utility

By consistency I only mean the inability to derive self contradictions that would make it impossible for the definitions to be satisfied in the first place. Of course it is normally impossible to prove consistency. I think this is quite different to your usage of "consistency" in the above. Is that right?

In a way, if a new mathematical formalism is too easy to interpret in terms of existing mathematics then it is rather worthless. An excellent example is the set of complex numbers - so magical in its mathematical properties and yet impossible to interpret in terms of the reals.

Now I'm not claiming that my formalism meets the above criteria. However, at this stage it does appear to have rich mathematical properties that are interesting to explore. As you say, its interpretation is not so clear. But maybe that's not so bad.

I would say that a model of partial information will have trouble formalising consistency or completeness in the following sense: If we imagine an omniscient, complete database then we can compare the result of a query against this hypothetical database. Any reasonable definition of relation difference will be judged as neither necessarily consistent nor complete with respect to the hypothetical database. Despite this we still would like a formalism that in practice is useful for dealing with partial information. Perhaps as well, we will use the formalism (with its well understood mathematical properties) as a basis for our intuition rather than the reverse!

Your reference to completeness above seems fairly easy to meet. The algebra can be mapped to corresponding proof rules. In other words given the algebra we can define the subset of FO formulae that can be proven true (or false). I agree that this is the basis on which we understand the meaning of the formal data model.

I have wondered whether the interpretation should state that the CWA applies everywhere and nulls are always interpreted as nonexistence.

   Under that formalism we have not as you say "left that safe and well-understood realm" by allowing the value-unknown interpretation. Note for example that it is on this formal basis that a restricted form of negation can be defined. As far as the formalism is concerned we stay strictly within the confines of a 2vl.

Practically speaking (ie in the presence of partial information) the user should think of negation as failure to prove true.

We must somehow deal with the inconsistency between a formal interpretation of null as non-existence, and the user of the DB that may choose to *weaken* it to non-information. However, I think the only issue is that the user must be careful to weaken the interpretation of query results accordingly, and the way to do this is fairly straightforward.

Of course this needs some justification. As an illustrative example consider a conjunctive query compatible with select-project-join. There is no negation, and the variables can be regarded as existentially quantified. The result of the query will be a subset of the result that would be returned from an omniscient database. This could be regarded as consistency without completeness. What more could one hope for in the presence of partial information?

It seems clear that the result will extend to a disjunction of conjunctive queries. However things get more difficult with negation. Not surprising, one must be very careful with how to interpret query results, and I struggle to imagine in what sense one could hope for some guarantee of either consistency or completeness against an omniscient database. Nevertheless a query result can be useful, even if it has more to do with telling us about missing data. Received on Wed Nov 21 2007 - 04:05:02 CET

Original text of this message