Re: Nulls, integrity, the closed world assumption and events

From: JOG <jog_at_cs.nott.ac.uk>
Date: 17 Jan 2007 17:28:04 -0800
Message-ID: <1169083684.776024.190340_at_v45g2000cwv.googlegroups.com>


dawn wrote:
> JOG wrote:
> [snip]
> > As I said I prefer empty sets to nulls as a stop gap to the missing
> > information issue. I find relational decomposition even more intuitive,
> > but am still undecided if there is not a better methodology out there
> > somewhere than 6NF. However, I am a firm believer that an elegant
> > theory will lead to good practice.
>
> Any elegant theory? That has surely steered us wrong in the past. I
> don't know enough about relativity, quantum theory, string theory and
> the like, but I'm pretty sure there are examples for my point among
> those.

While I had to wait for the noise you've generated again to die down a bit dawn, I must say I do find it hard to keep conversations with yourself on track - you take the conversation off on such tangents to the main point, which then develop a polemic life of their own, that it is very hard to maintain focus. I really hope that this isn't intentional, because I am /sure/ you must know what I was inferring to the truism that good theory leads to good practice. If a theory doesn't lead to good practice, it is a darn good indication that it wasn't a good theory in the first place. Either way, this is all pretty irrelevant and distracting stuff.

> I, too, think that modeling using relations and applying
> set-based functions is good, but do not share your intuition for
> splitting up predicates whenever there is a chance of using the form a
> null that works with 2VL.

You should do though! Why? Because even though an empty set is a value the propositions we state, and predicates we use to describe commonalities, do not contain sets, and would only extremely rarely ever mention sets. I think entrenched techie processes have clouded your intuition - you are thinking in terms of record systems, and what will fit in the 'holes' they engender, instead of considering that when we state facts, by definition, they have no holes! Never!

And to think in terms of holes like this really isn't intuitive - for example, say you're writing some XML document. You are hardly going to create a tag with nothing in it, when you could have just left the tag out altogether are you? The logic you are following is asking me to put in that tag with an empty set inside it. Framed this way it seems like madness (if still preferable to putting an sql-null in there).

> I was advocating migrating from the "legacy"
> Pick approach to and SQL-DBMS approach when I was hit over the head
> repeatedly with the fact that it was more intuitive for developers and
> end-users, resulting in fewer data and process defects when using null
> as empty set with 2VL than null as unknown with 3VL. I suspect there
> would be few, if any, folks who have worked with each who prefer 3VL
> and related relational decomposition.

I do not see the coupling you make with 3VL and 6NF.

>
> > > > 2) If John Doe has no cars then no proposition that satisfies the
> > > > Predicate Person(id, name, cars) will involve him (there is no mention
> > > > of sets in the english version of the predicate).
> > >
> > > Once the predicate (loosely) below
> > > person with id has this name and these cars
> > >
> > > is modeled, this model also permit us to see the propositions that
> > > might not have naturally arisen, such as
> > >
> > > Person 123456 is John Doe who has no cars (of which we are aware)
> >
> > Well Decomposition allows this without a null by recording that very
> > statement in a relation with a corresponding predicate. I am of one
> > mind that commenting on missing information should not (yes,
> > theoretically) be squashed into a predicate concerning known
> > information. However, I also understand we are currently lacking in
> > tools that make this process quick and easy to use.
>
> I have yet to understand how it is in any way simpler to developer or
> use or less costly to maintain when a relation such as (id, name, ssn,
> status, status_date, former_names, birth_date, address_ids,
> email_addresses, phones, phone_types, cars, children_ids) turns into 8
> 1NF tables.

You've craftily (?) mixed up a couple of things here - MV and 6NF.

6NF - well I never said that it would be more efficient or less costly to maintain, so why bring that up? (again I do find it hard to keep the discussion on the rails, when these tangents stemming from debates with other people crop up). Nevertheless, those are physical issue not a logical ones anyhow. Simpler to develop? Well that depends how you need to use the data, but again only an issue whatsoever if one is lacking the correct tools to translate from logical layer to your application code's particular unique conceptual model. I agree we do need better OR mapping tools - or better still application code that handles data better than just with OO. Nevertheless these are all practical issues, not theoretical ones!

MV - As a coder I can promise you that looming Query Bias brings me out in a cold sweat. Using the inevitable MV in OO for example - ugh, iterator loop upon iterator loop upon iterator loop to access the info I need buried three vectors down. Bring on relationship-oriented programming.

[snip]
Dawn wrote:
> JOG wrote:
> > with your view that bias was not a concern, whereas most here would say
> > it was crucial for shared data.
>
> I would not say it is of no concern, but that there are tradeoffs and
> drawing the line in the sand there eliminates some possibly better
> solutions.

This would be a provocative discussion to have in another thread. Imo agreeing that contentions boil down to whether one believes query bias is important or not in the logical model, progresses the debate substantially.

[snip]
> > This is more helpful to me, so below is an example of what I mean -
> > please excuse the perhaps unfamiliar syntax with which I describe the
> > true statements being recorded:
> >
> > Non-MV person-table:
> > Id(1) & Name(Tom) & Car(Nissan)
> > Id(1) & Name(Tom) & Car(Porsche)
> > Id(2) & Name(Bob) & Car(Aston Martin)
> >
> > MV person-table:
> > Id(1) & Name(Tom) & Car(Nissan) & Car(Porsche)
> > Id(2) & Name(Bob) & Car(Aston Martin)
> >
> > Now if I JOIN the Non-MV version with a relation describing cars,
> > notably why they were designed say, I'd get:
> >
> > Non-MV Joined-table:
> > Id(1) & Name(Tom) & Car(Nissan) & Designed(Japan)
> > Id(1) & Name(Tom) & Car(Porsche) & Designed(Italy)
> > Id(2) & Name(Bob) & Car(Aston Martin) & Designed(UK)
> >
> > And I could read the corresponding propositions as "The person with ID
> > 1 is called Tom and has a Nissan car which are designed in Japan", for
> > example. Now for the non-MV JOIN:
> >
> > MV person-table:
> > Id(1) & Name(Tom) & Car(Nissan) & Car(Porsche) & Designed(Italy) &
> > Designed(Japan)
> > Id(2) & Name(Bob) & Car(Aston Martin) & Designed(UK)
>
> Actually, when performing a join on a nested attribute, you first
> "unnest" or "ungroup" it, so in this case you would have the same as
> your prior join.

So you have to reinterpret MV data temporarily into 1NF just prior to a JOIN? (I say reinterpret because I see no simple mathematical transition) Then this is surely an indication that the model requires 1NF for correct manipulation, and remembering that manipulation is one of the three vital tenets of a data model, along with structure and integrity, this is pretty much the raison d'etre of the logical layer! I am becoming more and more inclined to think that it is better /tools/ to translate between layers that could provide you with the functionality you desire.

> You get more benefit from this approach when there
> are multiple multi-values. You can then just unnest the joined
> attribute and take all others along for the ride as "property lists"
> without doing the nasty cartesian cross-product thing with every
> multivalue.

Entities and ownership have nothing to do with propositions. I worry that you often seem to mix use of the terminology of entities and propositions up. In my, granted, very limited experience this suggests an OO, process-centric as opposed to the data-centric outlook.

> This means that, for example, if you want to add one more
> attribute to your view and it happens to be one that has multiple
> values, such as phone numbers, you can just add that attribute into
> your view and all of your reports still work. You do not end adding
> more rows into your view. Do not underestimate how sweet this is.

This is a point however that I think has any basis worth investigating (although it is concerned with schema evolution and query preservation). I am not saying that MV approach you suggest is any solution (after all it requires a similar change from value domains, to set domains for the relevent attribute), but I think the topic is worthy of research.

>
> > I seem to have lost any correspondence between the Designed role and
> > the Car role in the first proposition because of the multivalues,
>
> Yes, because you "forgot" to unnest the attribute on which you were
> joining.
>
> > especially given the &'s are commutative, and so ordering is
> > unimportant. The more I join the more this problem extrapolates.
> >
> > N.B. If you did want to represent the multivalues as:
> > Id(1) & Name(Tom) & Car( { Nissan, Porsche } )
> > I see no way of keeping the representation consistent by incorporating
> > the Designed role within the value {Nissan, Porsche} as the result of a
> > JOIN - unless you use a nested relation such as:
> >
> > Id(1) & Name(Tom) & Car( { Name(Nissan) , Name(Porsche) } )
> >
> > in which case the Join applies to the relation value in the Car
> > attribute, giving:
> > Id(1) & Name(Tom) & Car( { Name(Nissan) & Designed(Japan),
> > Name(Porsche) & Designed(Italy) } )
> >
> > I certainly know which representation appeals to me - and thats the
> > first, 1NF example. Clear, intuitive and generating readable
> > propositions. regards, J.
>
> Does my response help clarify? --dawn

Well yes, the second half does (as I said I found the first half quite bad for getting anywhere here). However, of course I think there is a mistake in not facilitating JOINs without having to do some form of dodgy reinterpretation of a table before hand. And of course propositions don't have ownership sets, or property lists, just ands, ors, nots and therefores.

Y'know, I honestly /wanted/ Multi-Values to make theoretical sense, just as I once wanted binary graphs to make sense. I initially thought that atomic (non-MV) values were perhaps just an unnecessary vestige of Codd's 1969 decision to use mathematical relations before he turned a 'tuple' into a finite partial map with the addition of attribute names in 1970. But evaluating the effects of Query Bias and the effect on Join manipulation has meant I have /had/ to change my thinking on this, to concede that there is something more to 1NF than just fitting data into a relation, and to stop myself banging my head against a brick wall in the name of my own obstinacy. Its been a frustrating trip, but life is too short to get caught down dead ends. J. Received on Thu Jan 18 2007 - 02:28:04 CET

Original text of this message