Re: Nulls, integrity, the closed world assumption and events

From: dawn <dawnwolthuis_at_gmail.com>
Date: 14 Jan 2007 10:32:48 -0800
Message-ID: <1168799568.194107.221700_at_11g2000cwr.googlegroups.com>


JOG wrote:
> dawn wrote:
> > JOG wrote:
> > > dawn wrote:
> > > > JOG wrote:
> > > > > David wrote:
> > > > >
> > > > > > Brian Tkatch wrote:
> > > > > > > David wrote:
> > > > > > > > Brian Tkatch wrote:
> > > > > > > > > Cimode wrote:
> > > > > > > > > > David wrote:
> > > > > > > > > > > Consider the following relation
> > > > > > > > > > >
> > > > > > > > > > > person(P,M,F) :- person P has mother M, father F.
> > > > > > > > > > >
> > > > > > > > > > > By induction a non-empty
> > > > > > > > > > > database would have to be infinite.
> > > > > > > > > > A false premise makes all deductions coming from it false. Closed
> > > > > > > > > > World Assumption does not say anything about the number of element that
> > > > > > > > > > belong to a domain of values from which one attribute values are
> > > > > > > > > > derived. In the case of a *person* domain, the number of elements in
> > > > > > > > > > the set is certainly finite. Therefore, the number of propositions
> > > > > > > > > > involving person as an attribute is limited as well.
> > > > > > > > >
> > > > > > > > > Would not the question be better stated:
> > > > > > > > >
> > > > > > > > > If a table lists linked chains (with the linked-to link), how is the
> > > > > > > > > final link (whether first or last) stated?
> > > > > > > > >
> > > > > > > > > In such a case i link the item to itself.
> > > > > > > >
> > > > > > > > You suggest a proposition stating that a person is their own
> > > > > > > > mother/father?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > In a cause-effect table that records all causes and effects has to
> > > > > > > assume a circular relationship for either the prime cause or (current)
> > > > > > > final effect.
> > > > > > >
> > > > > > > If the table in question here was to denote parents and their children,
> > > > > > > there would be allowance for an entry of no children, thus ending the
> > > > > > > chain. However, the case here is people and their parents, without the
> > > > > > > option of no parent.
> > > > > > >
> > > > > > > This means the chain must be circular, the question is to which link.
> > > > > > > Being it would create an impossible relationship for a parent to have a
> > > > > > > child as his parent, the circular relationship must be to itself.
> > > > > > >
> > > > > > > So yes, i would indeed suggest a proposition stating that a person is
> > > > > > > their own parent.
> > > > > >
> > > > > > Do you suggest this by analogy to algorithms and data structures
> > > > > > (written in C++ for example) that use the same technique?
> > > > > > Interestingly the more common approach is to use null pointers.
> > > > > >
> > > > > > IMO the RM is founded first and foremost on mathematical logic, and
> > > > > > therefore stating any proposition that is actually wrong must be
> > > > > > avoided. This is also why I think nulls are bad.
> > > > > >
> > > > > > Whilst on the subject of nulls, I have seen a paper written in '83
> > > > > > by Carlo Zaniolo that suggests nulls can represent "no information"
> > > > > > (which encompasses all the more specific interpretations such as
> > > > > > "unknown" and "non-existent"), and this appears to lead to a
> > > > > > decent mathematical model, unlike Codd's 3vl which is far from
> > > > > > compelling.
> > > > >
> > > > > Interesting. However I'd contest that there can any concept of 'missing
> > > > > information' at all at the logical level (where either we know a fact
> > > > > or we don't). It seems to me that information may only be deemed
> > > > > 'missing' at the conceptual level, and that for the database layer
> > > > > proper the term is a misnomer that causes a lot of confusion.
> > > >
> > > > The proposition modeled could be
> > > >
> > > > John Doe, with id 12345, has no cars of which we are aware.
> > > >
> > > > The meaning of the instance of
> > > >
> > > > Person(id,name,cars)
> > >
> > > While the use of the empty set may be preferable to the concept of a
> > > NULL both theoretically and pragmatically, I can see few horrible
> > > issues with it:
> > >
> > > 1) Every element in the column must be a set. Even singletons.
> > > Complexity ensues and it begs the question why id and name would not
> > > also be set values?
> >
> > Maybe this is the case for an elegant theory, but practically a
> > dictionary item (attribute column name of which there may be more than
> > one per attribute in various products) is defined as either having 0..1
> > or 0..M by indicating it is "single-valued" (which implies "or empty")
> > or multi-valued (similarly).

>

> As I said I prefer empty sets to nulls as a stop gap to the missing
> information issue. I find relational decomposition even more intuitive,
> but am still undecided if there is not a better methodology out there
> somewhere than 6NF. However, I am a firm believer that an elegant
> theory will lead to good practice.

Any elegant theory? That has surely steered us wrong in the past. I don't know enough about relativity, quantum theory, string theory and the like, but I'm pretty sure there are examples for my point among those. I, too, think that modeling using relations and applying set-based functions is good, but do not share your intuition for splitting up predicates whenever there is a chance of using the form a null that works with 2VL. I was advocating migrating from the "legacy" Pick approach to and SQL-DBMS approach when I was hit over the head repeatedly with the fact that it was more intuitive for developers and end-users, resulting in fewer data and process defects when using null as empty set with 2VL than null as unknown with 3VL. I suspect there would be few, if any, folks who have worked with each who prefer 3VL and related relational decomposition.

> > > 2) If John Doe has no cars then no proposition that satisfies the
> > > Predicate Person(id, name, cars) will involve him (there is no mention
> > > of sets in the english version of the predicate).
> >
> > Once the predicate (loosely) below
> > person with id has this name and these cars
> >
> > is modeled, this model also permit us to see the propositions that
> > might not have naturally arisen, such as
> >
> > Person 123456 is John Doe who has no cars (of which we are aware)

>

> Well Decomposition allows this without a null by recording that very
> statement in a relation with a corresponding predicate. I am of one
> mind that commenting on missing information should not (yes,
> theoretically) be squashed into a predicate concerning known
> information. However, I also understand we are currently lacking in
> tools that make this process quick and easy to use.

I have yet to understand how it is in any way simpler to developer or use or less costly to maintain when a relation such as (id, name, ssn, status, status_date, former_names, birth_date, address_ids, email_addresses, phones, phone_types, cars, children_ids) turns into 8 1NF tables. I can understand how there is a different kind of constraint handling that goes on, so you have a likelihood of different issues in each. For example, when the logical model for the relation includes multivalues (so it is headed to a DBMS that is not a legacy SQL-DBMS), the referential integrity for parent and child "tables' is automatic when the lists are nested as in the single-relation approach.

> > > In his case facts
> > > state that the part of the predicate "has cars" is inapplicable to him,
> > > and so it seems unintuitive indeed to try and record a corresponding
> > > proposition with a value for 'cars' in it.
> >
> > There is a noticable difference in how intuitive it is for end users to
> > work with systems that treat nulls as empty sets compared to working
> > with solutions written for SQL-DBMS's. It really is more intuitive,
> > both for developers and end-users. I suspect that we judge attributes
> > like id and name to be identifiers of the entity and determine whether
> > we think it is important to have all of the information about cars or
> > not in any particular situation. We don't have any cars recorded for
> > John Doe? heigh ho. We want to put John Doe in the database and we are
> > asked what cars he has, but we don't know them? Just don't put any in,
> > no problem.
> >
> > > 3) Most importantly, using a set to allows multiple values cripples a
> > > database in that it can no longer apply the JOIN mechanism.
> >
> > I understand it changes this aspect, but, it gives you the option of a
> > join with multiples and not just one (we use NEST and UNNEST in
> > practice where I think Date uses GROUP and UNGROUP in tutD). If, on
> > the other hand, you wish to join this person with information about
> > each of their children and just haul the list of cars along as a
> > "property list" related to the person, you can do that handily (without
> > the cross-product of children and cars making things unruly).

>

> I do not agree with this sort of entity thinking - we've been through
> that before ;)

Yes, but people (both developers and end-users) think in terms of "things" (aka objects, aka entities), so replace the term "things" for entity anywhere that it otherwise offends..

>I believe we resolved the arguments to the entity-view
> being QUERY BIASED,

Yes, agreed. We just disagree on how important that is. If someone were to nest a "strong entity" (such as a thing I could touch) under another, that would be poor design and there would be problems due to query bias. Otherwise, as long as the "vocabularly" has what the user needs, queries are quite a bit easier for them (MV end-users often actually do use the language of the database to ask it questions).

> whereas the proposition-view being QUERY-NEUTRAL,

Yes, the data are "democratized" with all nouns somewhat equal (although some are nested as attributes under others that are relations).

> with your view that bias was not a concern, whereas most here would say
> it was crucial for shared data.

I would not say it is of no concern, but that there are tradeoffs and drawing the line in the sand there eliminates some possibly better solutions. Also, the fact that we might not see clearly that our person_cars relation is a child of person makes for a sea of all seemingly "top level" relations when that is not, conceptually, the case.

> Not sure it is worth us going back
> there...

Fair enough. If I ever have a better way to state it, I might return, but we can agree to disagree on that one, recognizing I take a minority position.

> >
> > > This, I now
> > > believe, is the fundamental point of 1NF and Codd's concept of 'atomic
> > > values' (quoted given the ambiguity of that term, and all the perhaps
> > > red-herring discussions it has generated on cdt).
> >
> > Perhaps this concern can be addressed by simply using a product that is
> > not 1NF and seeing that you can still use joins. For (what looks to
> > be) a reasonable SQL implementation on top of NF2, you might want to
> > try Cache'
> >
> > > I've spent a lot of time exploring MV (and one does not need sets to
> > > apply it formally), and I see no way past problem 3 in any formulation
> > > of it.
> >
> > Perhaps I need to understand the issue better. Most MV developers do
> > not employ SQL, although I spent 5 years at while back working with SQL
> > on top of UniData (for read-only applications). Given that I worked
> > with JOINs both on single and multi-valued fields, I'm not catching on
> > to your concern. Sorry to be dense, but if you could give an example
> > of what does not work when one includes nested sets, I would appreciate
> > it. Thanks. --dawn

>

> This is more helpful to me, so below is an example of what I mean -
> please excuse the perhaps unfamiliar syntax with which I describe the
> true statements being recorded:
>

> Non-MV person-table:
> Id(1) & Name(Tom) & Car(Nissan)
> Id(1) & Name(Tom) & Car(Porsche)
> Id(2) & Name(Bob) & Car(Aston Martin)
>

> MV person-table:
> Id(1) & Name(Tom) & Car(Nissan) & Car(Porsche)
> Id(2) & Name(Bob) & Car(Aston Martin)
>

> Now if I JOIN the Non-MV version with a relation describing cars,
> notably why they were designed say, I'd get:
>

> Non-MV Joined-table:
> Id(1) & Name(Tom) & Car(Nissan) & Designed(Japan)
> Id(1) & Name(Tom) & Car(Porsche) & Designed(Italy)
> Id(2) & Name(Bob) & Car(Aston Martin) & Designed(UK)
>

> And I could read the corresponding propositions as "The person with ID
> 1 is called Tom and has a Nissan car which are designed in Japan", for
> example. Now for the non-MV JOIN:
>

> MV person-table:
> Id(1) & Name(Tom) & Car(Nissan) & Car(Porsche) & Designed(Italy) &
> Designed(Japan)
> Id(2) & Name(Bob) & Car(Aston Martin) & Designed(UK)

Actually, when performing a join on a nested attribute, you first "unnest" or "ungroup" it, so in this case you would have the same as your prior join. You get more benefit from this approach when there are multiple multi-values. You can then just unnest the joined attribute and take all others along for the ride as "property lists" without doing the nasty cartesian cross-product thing with every multivalue. This means that, for example, if you want to add one more attribute to your view and it happens to be one that has multiple values, such as phone numbers, you can just add that attribute into your view and all of your reports still work. You do not end adding more rows into your view. Do not underestimate how sweet this is.

> I seem to have lost any correspondence between the Designed role and
> the Car role in the first proposition because of the multivalues,

Yes, because you "forgot" to unnest the attribute on which you were joining.

> especially given the &'s are commutative, and so ordering is
> unimportant. The more I join the more this problem extrapolates.

>

> N.B. If you did want to represent the multivalues as:
> Id(1) & Name(Tom) & Car( { Nissan, Porsche } )
> I see no way of keeping the representation consistent by incorporating
> the Designed role within the value {Nissan, Porsche} as the result of a
> JOIN - unless you use a nested relation such as:
>

> Id(1) & Name(Tom) & Car( { Name(Nissan) , Name(Porsche) } )
>

> in which case the Join applies to the relation value in the Car
> attribute, giving:
> Id(1) & Name(Tom) & Car( { Name(Nissan) & Designed(Japan),
> Name(Porsche) & Designed(Italy) } )
>

> I certainly know which representation appeals to me - and thats the
> first, 1NF example. Clear, intuitive and generating readable
> propositions. regards, J.

Does my response help clarify? --dawn

> >
> > > J.
> > >
> > >
> > > >
> > > > that represents id=12345 name=John Doe cars=null=empty set
> > > > is that we know that the name of person 12345 is John Doe and he has no
> > > > cars of which we are aware. We can drop the "of which we are aware"
> > > > for most purposes, since all data collected is only that of which we
> > > > are aware.
> > > >
> > > > This approach works very well with two-valued logic, permitting us to
> > > > equate the empty set with the empty set. If you need more information
> > > > than this gives, then add attributes to collect more information (such
> > > > as a date on which the cars attribute value was verified). --dawn
Received on Sun Jan 14 2007 - 19:32:48 CET

Original text of this message