Re: Nulls, integrity, the closed world assumption and events

From: dawn <dawnwolthuis_at_gmail.com>
Date: 18 Jan 2007 17:55:29 -0800
Message-ID: <1169171729.167555.276310_at_v45g2000cwv.googlegroups.com>


JOG wrote:
> dawn wrote:

<snip>
> > 2) When we collect and show data to a human, we are often collecting it
> > with "blanks" and then showing it back to the user with blanks. For
> > example, they might have a screen and related reports where the
> > proposition Person 12345 is a male named John Doe whose hair color is
> > _________. The user sees the blank and does not fill it in, either
> > because John Doe is bald or because they do not know his hair color.
> > Someone else then sees a report that might provide this very
> > proposition back.
>
> This is so wrong my head is starting to hurt. I think it may be the
> fundamental point that you are grappling with. A record is not a
> proposition. A proposition is not a record.

I am aware of that. There are two places where propositions reside and have logic applied to them, however. We can talk about propositions that users "know" and are passing into the computer through their interface, right? Can we also talk about the propositions that land back in their brains based on the representation they are getting back?  There are two "things" that are using logic to process propositions in this picture -- a user and a machine. There is a benefit to them applying similar logic and "knowing" similar propositions. I recognize that we can take the user's propositions and map them in many different ways, splitting them up or whatever. We can also apply a different system of logic than the user is likely to use. When looking only at the computer side of this system, we could judge it on internal consistency, mathematical elegance, etc, but that is not ultimately how we judge the overall system (I would hope). We need those users to make good decisions based on the information they receive. So, when I say that a proposition is passed back to the user, I mean that some representation of data is passed back that causes the user to recreate that proposition.

I recognize this has to do with the representation of propositions (because I'm talking about the interface between a human being and the DBMS and/or apps built on the DBMS) and software tools or apps might do enough work to make the stored propositions completely unseen by everyone (developers, end-users, etc), but then the data model is something other than the one used for storage. The "logical data model" is the way the propositions are modeled in that interface between humans and the DBMS.

> With all the discussions
> you've had on cdt it worries the bejeezus out of me could one still be
> confusing the two.

I do not think I am, but I do see how my use of terms was imprecise.

> Propositions do not have holes.

But they can be something like

John Doe has an unknown number of cars.
This has no hole as a proposition, but maps to a conceptual hole so that a human being interpreting this might replace it with the "proposition" John Doe has _______ cars. Although the computer does not perform logic on such a proposition, the human being does.

> Records can do.

Yes, and we can map the one to the other, right?

> Entities and Records = conceputal layer, Database = logical layer, and
> never the twain should meet in logic.

Ah, again, there are two places where logic is put to use. 1) the user and 2) the machine. The end-user works at the conceptual level. The machine at the logical. This is a very critical point, but I don't know if you will also think it a tangent (even if I don't, so warning...). We can have a computer system that works flawlessly within its own system of logic, but we also have human beings who sometimes work with the logical model (as we expect the developers to do) and sometimes with the conceptual model (as users will often do even after considerable training to do otherwise). Some developers are savvy enough to attempt to make the external perception of the data align with the logic of the end-user, while many are inclined to make the user conform to the logic of the tools. This is where I have seen a significant difference in productivity and quality related to the difference between use of 2VL and 3VL in the DBMS and how that impacts the user.

> Entities, Objects & Records are
> constructed from propositions, query and build - that's at the heart of
> how we communicate (and why people miscommunicate because they have
> different rules and contexts for building those entities in their
> heads).

Agreed, although I would add that often at the start the propositions stem from the entities people are thinking about.

> Have you looked at that book "Data and Reality" by William Kent that I
> once recommended to you? It really helped me recognize that 'record'
> mistake, and I think it would be an invaluable read in this case.

I read it and loved it. Thanks.

> > It is ambiguous, but if it were important that it
> > not be, then we could have collected additional data and insisted that
> > the data entry person tell us why they are leaving it blank. Human
> > beings are working with these propositions that include blanks
> > repeatedly, performing logical functions based on them. It seems that
> > we could model human language and behavior with our stored propositions
> > and do likewise. In other words, when we model the propositions the
> > way the user does, we are simulating human behavior, the behavior that
> > humans would use if the data were stored on paper forms in file
> > cabinets. It is the way a person would intuitively work with the data.
> > If we cannot simulate this with a computer, then why not? Again, it
> > seems to work very well and from what I have seen, users using systems
> > with this approach to NULL have fewer problems with querying and
> > working with it. The system then "think like" they do.
> >
> > > And to think in terms of holes like this really isn't intuitive - for
> > > example, say you're writing some XML document. You are hardly going to
> > > create a tag with nothing in it, when you could have just left the tag
> > > out altogether are you?
> >
> > If it is an XML document to back a report to the user, one might do so.
> > If we are repeating the tags for each column with each row of data,
> > then (ugly as that is), so be it. For a null value, I would think you
> > would have <hairColor /> (or <hairColor></hairColor>) for that row in
> > your XML.
> >
> > > The logic you are following is asking me to put
> > > in that tag with an empty set inside it.
> >
> > Yes, or the shorthand version of that in XML.
> >
> > > Framed this way it seems like
> > > madness (if still preferable to putting an sql-null in there).
> >
> > Ah, but it isn't, and likely results in less madness than the SQL
> > approach.
>
> Again frustrating!
>
> * You've neglected the point that I was showing you it was
> /unintuitive/, not possible as some bodge in a specific case.

Perhaps I was not clear that I was disagreeing with you.

> * This was nothing to do with SQL, it was about the uninuitiveness of
> holes in propositions, why bring something orthogonal like SQL back in!

You might notice that you brought it in (the sentence before I mentioned it), but I'll try to be more careful.

> * I could not believe for a moment that someone (especially one who
> refers to trying to stick good practices) would ever commend an
> employee for generating like <person><cars>{}<cars></person> instead of
> just having no tag at all.

If there is a good schema document so that we have the metadata elsewhere, then skip overtly indicating missing data in the XML document. If this is the source for both data and metadata (which is how I was taking it), then there are instances when we would want to include all of the information the user would get (which could include an empty "cell" under a heading, for example). This is not a major point for me, so I'm good with dropping it.

> > > > I was advocating migrating from the "legacy"
> > > > Pick approach to and SQL-DBMS approach when I was hit over the head
> > > > repeatedly with the fact that it was more intuitive for developers and
> > > > end-users, resulting in fewer data and process defects when using null
> > > > as empty set with 2VL than null as unknown with 3VL. I suspect there
> > > > would be few, if any, folks who have worked with each who prefer 3VL
> > > > and related relational decomposition.
> > >
> > > I do not see the coupling you make with 3VL and 6NF.
> >
> > I don't recall where 6NF was in this mix, but my understanding is that
> > many (not all) put the elimination of nulls into 1NF and all subsequent
> > NF's are built on top of that. I did not intend to couple 3VL with 6NF
> > per se and I don't even recall that 6NF was in this discussion.
>
> The connection was that 6NF is the last word in relational
> decomposition down to irreducible tuples. Let me restate then - I don't
> see what 3VL has to do with relational decomposition at all, so I
> thought the paragraph this references was yet another red-herring.
>
> The rest I will leave to a later post.

My understanding of relational decomposition included the elimination of NULLs somewhere along the line (ignoring other non functional-dependency normalization rules that also trouble me, just zeroing in on this one). If that is not the case, then I did entirely miss on that. If I had that right, then in the case of using 3VL, it makes sense to eliminate NULLs (as I have done when modeling for SQL-DBMS's). This is not an issue with DBMS's that employ 2VL so that when I model for those systems I do not do that aspect of relational decomposition (unless I am misunderstanding "relational decomposition").

cheers! --dawn
P.S. Apologies again if I am missing your points, JOG, as I am trying to respond to the points presented. I try my best to bridge various gaps (including terminology as well as those in my brain). Received on Fri Jan 19 2007 - 02:55:29 CET

Original text of this message