Re: RM formalism supporting partial information

From: Brian Selzer <>
Date: Mon, 26 Nov 2007 14:53:36 GMT
Message-ID: <Q5B2j.48166$>

"David BL" <> wrote in message
> On Nov 24, 3:56 am, Jan Hidders <> wrote:
>> On 21 nov, 04:05, David BL <> wrote:
>> > By consistency I only mean the inability to derive self contradictions
>> > that would make it impossible for the definitions to be satisfied in
>> > the first place. Of course it is normally impossible to prove
>> > consistency. I think this is quite different to your usage of
>> > "consistency" in the above. Is that right?
>> Yes. I meant that you are not consistently using or defining a single
>> interpretation of missing data but rather seem to taking sometimes one
>> and sometimes the other depending on what make the math looks cute.
>> That approach is likely to end in elegant but meaningless results.
> AFAIK you say this because of something to do with the CWA but I can't
> get past the feeling that the CWA is incompatible with partial
> information and that's all there is to it! Perhaps you could give me
> an example to back up that statement.

The CWA is not incompatible with partial information. The Closed World Assumption is applied to each individual relation, not to the database as a whole. The content in a relation, EMPLOYEE_AGES, could be "known ages for employees." The absence of a tuple for a particular employee simply means that the age is not known (assuming of course that there is a tuple in EMPLOYEE that asserts the existence of said employee).

> I have wondered whether what you're after is a model that allows the
> CWA to apply to some projections and not to others. Well I agree, but
> I still think it's interesting to investigate the simpler case of
> where the CWA is never applicable.
>> > I would say that a model of partial information will have trouble
>> > formalising consistency or completeness in the following sense: If we
>> > imagine an omniscient, complete database then we can compare the
>> > result of a query against this hypothetical database. Any reasonable
>> > definition of relation difference will be judged as neither
>> > necessarily consistent nor complete with respect to the hypothetical
>> > database. Despite this we still would like a formalism that in
>> > practice is useful for dealing with partial information. Perhaps as
>> > well, we will use the formalism (with its well understood
>> > mathematical properties) as a basis for our intuition rather than the
>> > reverse!
>> The thing is that you are doing it the wrong way around. You should
>> first define in some formalism like logic what the data means (again,
>> see Reiter for how this is done), and only *then* you should start
>> thinking about the operators, which ones you want, and what they
>> exactly mean.
>> > Your reference to completeness above seems fairly easy to meet. The
>> > algebra can be mapped to corresponding proof rules. In other words
>> > given the algebra we can define the subset of FO formulae that can be
>> > proven true (or false). I agree that this is the basis on which we
>> > understand the meaning of the formal data model.
>> I don't think that's what I said. When formalizing the meaning the
>> present and missing data the algebra operators should not be in the
>> picture.
>> > I have wondered whether the interpretation should state that the CWA
>> > applies everywhere and nulls are always interpreted as non-
>> > existence. Under that formalism we have not as you say "left that
>> > safe and well-understood realm" by allowing the value-unknown
>> > interpretation. Note for example that it is on this formal basis
>> > that a restricted form of negation can be defined. As far as the
>> > formalism is concerned we stay strictly within the confines of a 2vl.
>> Correct.
>> > Practically speaking (ie in the presence of partial information) the
>> > user should think of negation as failure to prove true.
>> No, no, no. That's a very sloppy and confusing way of formulating a
>> CWA. E.g., it might depend on the effectiveness of your theorem prover
>> whether something is considered true or not.
> Isn't the whole point that there isn't a CWA in the presence of
> partial information? Closed = Complete! What else could closed mean?
> I might agree with your sentiment more generally for deductive
> databases, but for the restrictive logical inference used in RM,
> thinking of negation as failure to prove true seems simple and
> intuitive.
> It seems to me either you define negation as failure to prove true, or
> disallow negation completely (in the presence of partial information).
>> > We must somehow deal with the inconsistency between a formal
>> > interpretation of null as non-existence, and the user of the DB that
>> > may choose to *weaken* it to non-information. However, I think the
>> > only issue is that the user must be careful to weaken the
>> > interpretation of query results accordingly, and the way to do this is
>> > fairly straightforward.
>> The user is not the problem here, you are. :-) You first need to
>> figure out what the options are that you want to offer to the user and
>> understand their consequences. The best way to do that is by starting
>> to formalize them properly, preferably in terms of a logical theory.
>> Again, see Reiter for how this is done.
>> > Of course this needs some justification. As an illustrative example
>> > consider a conjunctive query compatible with select-project-join.
>> > There is no negation, and the variables can be regarded as
>> > existentially quantified. The result of the query will be a subset of
>> > the result that would be returned from an omniscient database. This
>> > could be regarded as consistency without completeness. What more
>> > could one hope for in the presence of partial information?
>> Exactly, so in that sense it is actually complete, and you can make
>> that claim precise. The set of tupels in the answer will be exactly
>> the set of tuples that are certain to be in the result of the same
>> query over the omniscient database. By the nature of the problem every
>> query should actually return 2 sets of tuples: the set of certain
>> answers, and the set of possible answers. Your operators should
>> therefore not operator on relations but on pairs of relations.
> Firstly a minor nit pick: you can't say "possible answers", because
> they don't actually represent an upper bound on the result in the
> omniscient database.
> You don't seem to be accounting for the idea that in my formalism you
> can interpret a given mv-relation r in different ways, using a project-
> interpret-project...
> proj(X1, I( proj( X2, r) ) )
> This gives you all those different result sets that you desire.
> My intersection, union and join operators avoid throwing away any
> partial information that might be applicable to some subsequent
> project-interpret-project. Therefore the operators do in a sense
> operate on multiple conventional relations all at once. That is why
> they are interesting.
Received on Mon Nov 26 2007 - 15:53:36 CET

Original text of this message