# Re: RM formalism supporting partial information

Date: Mon, 3 Dec 2007 09:32:33 -0800 (PST)

Message-ID: <40d97dbd-760b-4ea2-8534-6ca3f1a4f7f5_at_b40g2000prf.googlegroups.com>

On 3 dec, 02:51, David BL <davi..._at_iinet.net.au> wrote:

> On Dec 2, 9:22 pm, Jan Hidders <hidd..._at_gmail.com> wrote:

*>
**>
**>
**> > On 1 dec, 04:17, David BL <davi..._at_iinet.net.au> wrote:
**>
**> > > On Dec 1, 12:48 am, Jan Hidders <hidd..._at_gmail.com> wrote:
**>
**> > > > On 29 nov, 03:54, David BL <davi..._at_iinet.net.au> wrote:
**>
**> > > > > The concept of "possible answers" isn't universally applicable, and
**> > > > > therefore seems to represent quite a problem for any model of partial
**> > > > > information that emphasises that concept as fundamental.
**>
**> > > > The concept of 'possible answers' applies and is well defined for all
**> > > > databases where you have precisely defined what it means if certain
**> > > > data is missing, and note that his includes the definition that says
**> > > > that it means nothing. So what you mean by "isn't universally
**> > > > applicable" is completely beyond my comprehension.
**>
**> > > Consider the following predicates, all with OWA intensional
**> > > definitions
**>
**> > > age(Person,Age)
**> > > occupation(Person, Occupation)
**> > > married(Person,Person)
**> > > died(Person,Date)
**>
**> > > You say the concept of possible answers is well defined. How exactly
**> > > would you calculate the possible 27 year old pilots?
**>
**> > You seem to assume that "well defined" and "can be computed" is the
**> > same, which it isn't. But to answer your question, assuming that
**> > everybody has only one occupation that would be every person p for
**> > which there is no tuple (p, a) with a<>27 in relation age, and no
**> > tuple (p, o) with o<>"pilot" in relation occupation. If the domain of
**> > Person is not finite, or restricted by a relation person(Person) then
**> > the result may be infinite.
**>
**> Indeed in the example there is no person(Person) and no CWA specified
**> to limit what's possible.
*

But a type or domain will have been associated with the column. That defines the upper bound of the possibilities.

> Do you still say it is "well defined"?

Of course. This is not some terminology I made up, it's well established in the literature.

*> > > What does it mean precisely?
**>
*

> > It contains every person that might be a 27 year old pilot as far as

*> > the given database is concerned.
**>
**> What do you mean by "as far as the given database is concerned"?
*

That the database contains no information that logically implies the opposite.

> Surely that can only be regarded as CWAs on various intensional

*> definitions of the predicates?
*

Not if you use those terms in their usual meaning.

*> > > > > What do you think of the suggestion that the formalism (which is
**> > > > > concerned with extensions rather than intensions)
**>
**> > > > > 1) ignores the CWA/OWA distinction;
**>
**> > > > > 2) assumes the CWA applies everywhere; and
**>
**> > > > > 3) null is *always* interpreted as non-existence w.r.t.
**> > > > > the (carefully worded) intensional definitions?
**>
**> > > > > This approach seems simple and self consistent.
**>
**> > > > If I ignore for the moment 1) (because 1) and 2) seem contradictory
**> > > > because I cannot assume there is no difference between X and Y and at
**> > > > the same time assume that only Y applies everywhere) this is just the
**> > > > classical value-does-not-apply interpretation.
**>
**> > > I meant that the actual CWA/OWA distinction is absorbed into the
**> > > intensional definition, so that it can be assumed that with respect to
**> > > the intensional definition the formalism assumes a CWA. I thought
**> > > that was clear.
**>
**> > It was.
**>
**> > > > > It doesn't however, attempt to model the case of "value exists but is
**> > > > > unknown". IMO that case should be modeled *explicitly* with a
**> > > > > different predicate.Of
**>
**> > > > Sure, the value-does-not-apply interpretation can always also be
**> > > > represented without null values.
**>
**> > > > The thing is that you have now fully ignored the real problem of
**> > > > incomplete information which is that in practice the CWA does not
**> > > > always fully apply. Your main solution seems to be to redefine the
**> > > > meaning of the relations such that it does, which, of course, doesn't
**> > > > solve anything at all and simply puts the problem back on the plate of
**> > > > the user.
**>
**> > > You say "of course doesn't solve anything at all" without giving any
**> > > hint at all why you say that. Can you elaborate?
**>
**> > > What problem doesn't it address? Can you provide a specific example?
**>
*

> > Suppose you have a table R(a,b,c) with candidate key {a} where column

*> > c may contain null values that indicate that we don't know it's value.
**> > You can now solve this by splitting this into R1(a,b) and R2(a,c) and
**> > thus remove the null values. It could be that R was, apart from the
**> > null values, complete so that would mean that the CWA applies to R1,
**> > but not to R2. So it will be the case for some queries over R1 and R2
**> > that when computed in the usual way they return the exact answer, some
**> > will return the possible answers, and some will return neither.
**> > Wouldn't it be nice if the DBMS could tell you which ones do what? Or
**> > if you could tell the DBMS that it shouldn't compute the query as
**> > given but rather such that it return the set of possible (or certain)
**> > answers for the given query, if it can?
**>
**> I think if we have the proper intensional definitions in mind, and
**> assume every relation has a CWA then we will easily be able to
**> interpret a query correctly.
**>
**> Eg R1(a,b), and R2(a,c) are
**>
**> employee(Person,Date) <=>
**> Person is current employee of company X who commenced on Date.
**>
**> address(Person,Address) <=>
**> It is known that Person lives at Address
*

This smells a bit tautological. Unless you specify what "it is known that" means more precisely it might very well be "is in the address table" in which case this would be meaningless statement.

> then, as an example { P | employee(P,D) } \ { P | address(P,A) } can

*> be interpreted as (all) the persons that currently work for company X
**> that don't have a known address.
**>
**> It seems to me there can be lots of subtle variations in the
**> intensional definitions, such as the way employee(P,D) mentioned
**> company X whereas address(P,A) did not. This makes me skeptical
**> whether a DBMS will be able to formalise it, beyond simply allowing
**> the user to define a schema, support the RA, allow for integrity
**> constraints etc.
*

Formalizing that can in general be done by formulating a constraint over two databases d and d' with the schema, where d represents the actual database and d' the ideal correct database. In general that is too powerful so various more restricted ways of doing that are under research at the moment. One you have already formulated yourself, i.e., declaring that the CWA appplies to certain views, and the other is in terms of a relation R and a query Q over the ideal database that returns a result with the same header and the interpretation that R is complete for the tuples in Q.

- Jan Hidders