Re: Nulls, integrity, the closed world assumption and events

From: JOG <jog_at_cs.nott.ac.uk>
Date: 14 Jan 2007 06:44:50 -0800
Message-ID: <1168785890.311459.24150_at_m58g2000cwm.googlegroups.com>


dawn wrote:
> JOG wrote:
> > dawn wrote:
> > > JOG wrote:
> > > > David wrote:
> > > >
> > > > > Brian Tkatch wrote:
> > > > > > David wrote:
> > > > > > > Brian Tkatch wrote:
> > > > > > > > Cimode wrote:
> > > > > > > > > David wrote:
> > > > > > > > > > Consider the following relation
> > > > > > > > > >
> > > > > > > > > > person(P,M,F) :- person P has mother M, father F.
> > > > > > > > > >
> > > > > > > > > > By induction a non-empty
> > > > > > > > > > database would have to be infinite.
> > > > > > > > > A false premise makes all deductions coming from it false. Closed
> > > > > > > > > World Assumption does not say anything about the number of element that
> > > > > > > > > belong to a domain of values from which one attribute values are
> > > > > > > > > derived. In the case of a *person* domain, the number of elements in
> > > > > > > > > the set is certainly finite. Therefore, the number of propositions
> > > > > > > > > involving person as an attribute is limited as well.
> > > > > > > >
> > > > > > > > Would not the question be better stated:
> > > > > > > >
> > > > > > > > If a table lists linked chains (with the linked-to link), how is the
> > > > > > > > final link (whether first or last) stated?
> > > > > > > >
> > > > > > > > In such a case i link the item to itself.
> > > > > > >
> > > > > > > You suggest a proposition stating that a person is their own
> > > > > > > mother/father?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > In a cause-effect table that records all causes and effects has to
> > > > > > assume a circular relationship for either the prime cause or (current)
> > > > > > final effect.
> > > > > >
> > > > > > If the table in question here was to denote parents and their children,
> > > > > > there would be allowance for an entry of no children, thus ending the
> > > > > > chain. However, the case here is people and their parents, without the
> > > > > > option of no parent.
> > > > > >
> > > > > > This means the chain must be circular, the question is to which link.
> > > > > > Being it would create an impossible relationship for a parent to have a
> > > > > > child as his parent, the circular relationship must be to itself.
> > > > > >
> > > > > > So yes, i would indeed suggest a proposition stating that a person is
> > > > > > their own parent.
> > > > >
> > > > > Do you suggest this by analogy to algorithms and data structures
> > > > > (written in C++ for example) that use the same technique?
> > > > > Interestingly the more common approach is to use null pointers.
> > > > >
> > > > > IMO the RM is founded first and foremost on mathematical logic, and
> > > > > therefore stating any proposition that is actually wrong must be
> > > > > avoided. This is also why I think nulls are bad.
> > > > >
> > > > > Whilst on the subject of nulls, I have seen a paper written in '83
> > > > > by Carlo Zaniolo that suggests nulls can represent "no information"
> > > > > (which encompasses all the more specific interpretations such as
> > > > > "unknown" and "non-existent"), and this appears to lead to a
> > > > > decent mathematical model, unlike Codd's 3vl which is far from
> > > > > compelling.
> > > >
> > > > Interesting. However I'd contest that there can any concept of 'missing
> > > > information' at all at the logical level (where either we know a fact
> > > > or we don't). It seems to me that information may only be deemed
> > > > 'missing' at the conceptual level, and that for the database layer
> > > > proper the term is a misnomer that causes a lot of confusion.
> > >
> > > The proposition modeled could be
> > >
> > > John Doe, with id 12345, has no cars of which we are aware.
> > >
> > > The meaning of the instance of
> > >
> > > Person(id,name,cars)
> >
> > While the use of the empty set may be preferable to the concept of a
> > NULL both theoretically and pragmatically, I can see few horrible
> > issues with it:
> >
> > 1) Every element in the column must be a set. Even singletons.
> > Complexity ensues and it begs the question why id and name would not
> > also be set values?

>

> Maybe this is the case for an elegant theory, but practically a
> dictionary item (attribute column name of which there may be more than
> one per attribute in various products) is defined as either having 0..1
> or 0..M by indicating it is "single-valued" (which implies "or empty")
> or multi-valued (similarly).

As I said I prefer empty sets to nulls as a stop gap to the missing information issue. I find relational decomposition even more intuitive, but am still undecided if there is not a better methodology out there somewhere than 6NF. However, I am a firm believer that an elegant theory will lead to good practice.

>

> > 2) If John Doe has no cars then no proposition that satisfies the
> > Predicate Person(id, name, cars) will involve him (there is no mention
> > of sets in the english version of the predicate).
>

> Once the predicate (loosely) below
> person with id has this name and these cars
>

> is modeled, this model also permit us to see the propositions that
> might not have naturally arisen, such as
>
> Person 123456 is John Doe who has no cars (of which we are aware)

Well Decomposition allows this without a null by recording that very statement in a relation with a corresponding predicate. I am of one mind that commenting on missing information should not (yes, theoretically) be squashed into a predicate concerning known information. However, I also understand we are currently lacking in tools that make this process quick and easy to use.

>

> > In his case facts
> > state that the part of the predicate "has cars" is inapplicable to him,
> > and so it seems unintuitive indeed to try and record a corresponding
> > proposition with a value for 'cars' in it.
>

> There is a noticable difference in how intuitive it is for end users to
> work with systems that treat nulls as empty sets compared to working
> with solutions written for SQL-DBMS's. It really is more intuitive,
> both for developers and end-users. I suspect that we judge attributes
> like id and name to be identifiers of the entity and determine whether
> we think it is important to have all of the information about cars or
> not in any particular situation. We don't have any cars recorded for
> John Doe? heigh ho. We want to put John Doe in the database and we are
> asked what cars he has, but we don't know them? Just don't put any in,
> no problem.
>

> > 3) Most importantly, using a set to allows multiple values cripples a
> > database in that it can no longer apply the JOIN mechanism.
>

> I understand it changes this aspect, but, it gives you the option of a
> join with multiples and not just one (we use NEST and UNNEST in
> practice where I think Date uses GROUP and UNGROUP in tutD). If, on
> the other hand, you wish to join this person with information about
> each of their children and just haul the list of cars along as a
> "property list" related to the person, you can do that handily (without
> the cross-product of children and cars making things unruly).

I do not agree with this sort of entity thinking - we've been through that before ;) I believe we resolved the arguments to the entity-view being QUERY BIASED, whereas the proposition-view being QUERY-NEUTRAL, with your view that bias was not a concern, whereas most here would say it was crucial for shared data. Not sure it is worth us going back there...

>

> > This, I now
> > believe, is the fundamental point of 1NF and Codd's concept of 'atomic
> > values' (quoted given the ambiguity of that term, and all the perhaps
> > red-herring discussions it has generated on cdt).
>

> Perhaps this concern can be addressed by simply using a product that is
> not 1NF and seeing that you can still use joins. For (what looks to
> be) a reasonable SQL implementation on top of NF2, you might want to
> try Cache'
>

> > I've spent a lot of time exploring MV (and one does not need sets to
> > apply it formally), and I see no way past problem 3 in any formulation
> > of it.
>

> Perhaps I need to understand the issue better. Most MV developers do
> not employ SQL, although I spent 5 years at while back working with SQL
> on top of UniData (for read-only applications). Given that I worked
> with JOINs both on single and multi-valued fields, I'm not catching on
> to your concern. Sorry to be dense, but if you could give an example
> of what does not work when one includes nested sets, I would appreciate
> it. Thanks. --dawn

This is more helpful to me, so below is an example of what I mean - please excuse the perhaps unfamiliar syntax with which I describe the true statements being recorded:

Non-MV person-table:

Id(1) & Name(Tom) & Car(Nissan)
Id(1) & Name(Tom) & Car(Porsche)
Id(2) & Name(Bob) & Car(Aston Martin)

MV person-table:
Id(1) & Name(Tom) & Car(Nissan) & Car(Porsche) Id(2) & Name(Bob) & Car(Aston Martin)

Now if I JOIN the Non-MV version with a relation describing cars, notably why they were designed say, I'd get:

Non-MV Joined-table:

Id(1) & Name(Tom) & Car(Nissan) & Designed(Japan)
Id(1) & Name(Tom) & Car(Porsche) & Designed(Italy)
Id(2) & Name(Bob) & Car(Aston Martin) & Designed(UK)

And I could read the corresponding propositions as "The person with ID 1 is called Tom and has a Nissan car which are designed in Japan", for example. Now for the non-MV JOIN:

MV person-table:
Id(1) & Name(Tom) & Car(Nissan) & Car(Porsche) & Designed(Italy) & Designed(Japan)
Id(2) & Name(Bob) & Car(Aston Martin) & Designed(UK)

I seem to have lost any correspondence between the Designed role and the Car role in the first proposition because of the multivalues, especially given the &'s are commutative, and so ordering is unimportant. The more I join the more this problem extrapolates.

N.B. If you did want to represent the multivalues as: Id(1) & Name(Tom) & Car( { Nissan, Porsche } ) I see no way of keeping the representation consistent by incorporating the Designed role within the value {Nissan, Porsche} as the result of a JOIN - unless you use a nested relation such as:

Id(1) & Name(Tom) & Car( { Name(Nissan) , Name(Porsche) } )

in which case the Join applies to the relation value in the Car attribute, giving:
Id(1) & Name(Tom) & Car( { Name(Nissan) & Designed(Japan),
Name(Porsche) & Designed(Italy) } )

I certainly know which representation appeals to me - and thats the first, 1NF example. Clear, intuitive and generating readable propositions. regards, J.

>

> > J.
> >
> >
> > >
> > > that represents id=12345 name=John Doe cars=null=empty set
> > > is that we know that the name of person 12345 is John Doe and he has no
> > > cars of which we are aware. We can drop the "of which we are aware"
> > > for most purposes, since all data collected is only that of which we
> > > are aware.
> > >
> > > This approach works very well with two-valued logic, permitting us to
> > > equate the empty set with the empty set. If you need more information
> > > than this gives, then add attributes to collect more information (such
> > > as a date on which the cars attribute value was verified). --dawn
Received on Sun Jan 14 2007 - 15:44:50 CET

Original text of this message