Re: Nulls, integrity, the closed world assumption and events

From: dawn <dawnwolthuis_at_gmail.com>
Date: 14 Jan 2007 01:52:52 -0800
Message-ID: <1168768371.917144.104620_at_m58g2000cwm.googlegroups.com>


JOG wrote:
> dawn wrote:
> > JOG wrote:
> > > David wrote:
> > >
> > > > Brian Tkatch wrote:
> > > > > David wrote:
> > > > > > Brian Tkatch wrote:
> > > > > > > Cimode wrote:
> > > > > > > > David wrote:
> > > > > > > > > Consider the following relation
> > > > > > > > >
> > > > > > > > > person(P,M,F) :- person P has mother M, father F.
> > > > > > > > >
> > > > > > > > > By induction a non-empty
> > > > > > > > > database would have to be infinite.
> > > > > > > > A false premise makes all deductions coming from it false. Closed
> > > > > > > > World Assumption does not say anything about the number of element that
> > > > > > > > belong to a domain of values from which one attribute values are
> > > > > > > > derived. In the case of a *person* domain, the number of elements in
> > > > > > > > the set is certainly finite. Therefore, the number of propositions
> > > > > > > > involving person as an attribute is limited as well.
> > > > > > >
> > > > > > > Would not the question be better stated:
> > > > > > >
> > > > > > > If a table lists linked chains (with the linked-to link), how is the
> > > > > > > final link (whether first or last) stated?
> > > > > > >
> > > > > > > In such a case i link the item to itself.
> > > > > >
> > > > > > You suggest a proposition stating that a person is their own
> > > > > > mother/father?
> > > > >
> > > > > Yes.
> > > > >
> > > > > In a cause-effect table that records all causes and effects has to
> > > > > assume a circular relationship for either the prime cause or (current)
> > > > > final effect.
> > > > >
> > > > > If the table in question here was to denote parents and their children,
> > > > > there would be allowance for an entry of no children, thus ending the
> > > > > chain. However, the case here is people and their parents, without the
> > > > > option of no parent.
> > > > >
> > > > > This means the chain must be circular, the question is to which link.
> > > > > Being it would create an impossible relationship for a parent to have a
> > > > > child as his parent, the circular relationship must be to itself.
> > > > >
> > > > > So yes, i would indeed suggest a proposition stating that a person is
> > > > > their own parent.
> > > >
> > > > Do you suggest this by analogy to algorithms and data structures
> > > > (written in C++ for example) that use the same technique?
> > > > Interestingly the more common approach is to use null pointers.
> > > >
> > > > IMO the RM is founded first and foremost on mathematical logic, and
> > > > therefore stating any proposition that is actually wrong must be
> > > > avoided. This is also why I think nulls are bad.
> > > >
> > > > Whilst on the subject of nulls, I have seen a paper written in '83
> > > > by Carlo Zaniolo that suggests nulls can represent "no information"
> > > > (which encompasses all the more specific interpretations such as
> > > > "unknown" and "non-existent"), and this appears to lead to a
> > > > decent mathematical model, unlike Codd's 3vl which is far from
> > > > compelling.
> > >
> > > Interesting. However I'd contest that there can any concept of 'missing
> > > information' at all at the logical level (where either we know a fact
> > > or we don't). It seems to me that information may only be deemed
> > > 'missing' at the conceptual level, and that for the database layer
> > > proper the term is a misnomer that causes a lot of confusion.
> >
> > The proposition modeled could be
> >
> > John Doe, with id 12345, has no cars of which we are aware.
> >
> > The meaning of the instance of
> >
> > Person(id,name,cars)

>

> While the use of the empty set may be preferable to the concept of a
> NULL both theoretically and pragmatically, I can see few horrible
> issues with it:
>

> 1) Every element in the column must be a set. Even singletons.
> Complexity ensues and it begs the question why id and name would not
> also be set values?

Maybe this is the case for an elegant theory, but practically a dictionary item (attribute column name of which there may be more than one per attribute in various products) is defined as either having 0..1 or 0..M by indicating it is "single-valued" (which implies "or empty") or multi-valued (similarly).

> 2) If John Doe has no cars then no proposition that satisfies the
> Predicate Person(id, name, cars) will involve him (there is no mention
> of sets in the english version of the predicate).

Once the predicate (loosely) below
person with id has this name and these cars

is modeled, this model also permit us to see the propositions that might not have naturally arisen, such as

Person 123456 is John Doe who has no cars (of which we are aware)

> In his case facts
> state that the part of the predicate "has cars" is inapplicable to him,
> and so it seems unintuitive indeed to try and record a corresponding
> proposition with a value for 'cars' in it.

There is a noticable difference in how intuitive it is for end users to work with systems that treat nulls as empty sets compared to working with solutions written for SQL-DBMS's. It really is more intuitive, both for developers and end-users. I suspect that we judge attributes like id and name to be identifiers of the entity and determine whether we think it is important to have all of the information about cars or not in any particular situation. We don't have any cars recorded for John Doe? heigh ho. We want to put John Doe in the database and we are asked what cars he has, but we don't know them? Just don't put any in, no problem.

> 3) Most importantly, using a set to allows multiple values cripples a
> database in that it can no longer apply the JOIN mechanism.

I understand it changes this aspect, but, it gives you the option of a join with multiples and not just one (we use NEST and UNNEST in practice where I think Date uses GROUP and UNGROUP in tutD). If, on the other hand, you wish to join this person with information about each of their children and just haul the list of cars along as a "property list" related to the person, you can do that handily (without the cross-product of children and cars making things unruly).

> This, I now
> believe, is the fundamental point of 1NF and Codd's concept of 'atomic
> values' (quoted given the ambiguity of that term, and all the perhaps
> red-herring discussions it has generated on cdt).

Perhaps this concern can be addressed by simply using a product that is not 1NF and seeing that you can still use joins. For (what looks to be) a reasonable SQL implementation on top of NF2, you might want to try Cache'

> I've spent a lot of time exploring MV (and one does not need sets to
> apply it formally), and I see no way past problem 3 in any formulation
> of it.

Perhaps I need to understand the issue better. Most MV developers do not employ SQL, although I spent 5 years at while back working with SQL on top of UniData (for read-only applications). Given that I worked with JOINs both on single and multi-valued fields, I'm not catching on to your concern. Sorry to be dense, but if you could give an example of what does not work when one includes nested sets, I would appreciate it. Thanks. --dawn

> J.

>
>

> >
> > that represents id=12345 name=John Doe cars=null=empty set
> > is that we know that the name of person 12345 is John Doe and he has no
> > cars of which we are aware. We can drop the "of which we are aware"
> > for most purposes, since all data collected is only that of which we
> > are aware.
> >
> > This approach works very well with two-valued logic, permitting us to
> > equate the empty set with the empty set. If you need more information
> > than this gives, then add attributes to collect more information (such
> > as a date on which the cars attribute value was verified). --dawn
Received on Sun Jan 14 2007 - 10:52:52 CET

Original text of this message