Re: Nulls, integrity, the closed world assumption and events
Date: 21 Jan 2007 17:30:58 -0800
Message-ID: <1169429458.768797.39190_at_m58g2000cwm.googlegroups.com>
dawn wrote:
> JOG wrote:
> > dawn wrote:
> <snip>
> > > 2) When we collect and show data to a human, we are often collecting it
> > > with "blanks" and then showing it back to the user with blanks. For
> > > example, they might have a screen and related reports where the
> > > proposition Person 12345 is a male named John Doe whose hair color is
> > > _________. The user sees the blank and does not fill it in, either
> > > because John Doe is bald or because they do not know his hair color.
> > > Someone else then sees a report that might provide this very
> > > proposition back.
> >
> > This is so wrong my head is starting to hurt. I think it may be the
> > fundamental point that you are grappling with. A record is not a
> > proposition. A proposition is not a record.
>
> I am aware of that. There are two places where propositions reside and
> have logic applied to them, however. We can talk about propositions
> that users "know" and are passing into the computer through their
> interface, right? Can we also talk about the propositions that land
> back in their brains based on the representation they are getting back?
I am 100% sure there are solutions to all of your human/computer interface concerns that can be solved via a solid theoretical model without retrograding to the mistakes of the past, and I beseech you to think forward with solutions to your practical issues. For instance you want to provide multi-value propositions back to the user. 1NF however appears useful (essential?) for proposition manipulation. I see no reason why there cannot be a solution to this that accommodates both standpoints.
> There are two "things" that are using logic to process propositions in
> this picture -- a user and a machine. There is a benefit to them
> applying similar logic and "knowing" similar propositions. I recognize
> that we can take the user's propositions and map them in many different
> ways, splitting them up or whatever. We can also apply a different
> system of logic than the user is likely to use. When looking only at
> the computer side of this system, we could judge it on internal
> consistency, mathematical elegance, etc, but that is not ultimately how
> we judge the overall system (I would hope). We need those users to
> make good decisions based on the information they receive. So, when I
> say that a proposition is passed back to the user, I mean that some
> representation of data is passed back that causes the user to recreate
> that proposition.
>
> I recognize this has to do with the representation of propositions
> (because I'm talking about the interface between a human being and the
> DBMS and/or apps built on the DBMS) and software tools or apps might do
> enough work to make the stored propositions completely unseen by
> everyone (developers, end-users, etc), but then the data model is
> something other than the one used for storage. The "logical data
> model" is the way the propositions are modeled in that interface
> between humans and the DBMS.
>
> > With all the discussions
> > you've had on cdt it worries the bejeezus out of me could one still be
> > confusing the two.
>
> I do not think I am, but I do see how my use of terms was imprecise.
>
> > Propositions do not have holes.
>
> But they can be something like
>
> John Doe has an unknown number of cars.
> This has no hole as a proposition, but maps to a conceptual hole so
> that a human being interpreting this might replace it with the
> "proposition" John Doe has _______ cars. Although the computer does
> not perform logic on such a proposition, the human being does.
>
> > Records can do.
>
> Yes, and we can map the one to the other, right?
>
> > Entities and Records = conceputal layer, Database = logical layer, and
> > never the twain should meet in logic.
>
> Ah, again, there are two places where logic is put to use. 1) the user
> and 2) the machine. The end-user works at the conceptual level. The
> machine at the logical. This is a very critical point, but I don't
> know if you will also think it a tangent (even if I don't, so
> warning...). We can have a computer system that works flawlessly
> within its own system of logic, but we also have human beings who
> sometimes work with the logical model (as we expect the developers to
> do) and sometimes with the conceptual model (as users will often do
> even after considerable training to do otherwise). Some developers are
> savvy enough to attempt to make the external perception of the data
> align with the logic of the end-user, while many are inclined to make
> the user conform to the logic of the tools. This is where I have seen
> a significant difference in productivity and quality related to the
> difference between use of 2VL and 3VL in the DBMS and how that impacts
> the user.
>
> > Entities, Objects & Records are
> > constructed from propositions, query and build - that's at the heart of
> > how we communicate (and why people miscommunicate because they have
> > different rules and contexts for building those entities in their
> > heads).
>
> Agreed, although I would add that often at the start the propositions
> stem from the entities people are thinking about.
>
> > Have you looked at that book "Data and Reality" by William Kent that I
> > once recommended to you? It really helped me recognize that 'record'
> > mistake, and I think it would be an invaluable read in this case.
>
> I read it and loved it. Thanks.
>
> > > It is ambiguous, but if it were important that it
> > > not be, then we could have collected additional data and insisted that
> > > the data entry person tell us why they are leaving it blank. Human
> > > beings are working with these propositions that include blanks
> > > repeatedly, performing logical functions based on them. It seems that
> > > we could model human language and behavior with our stored propositions
> > > and do likewise. In other words, when we model the propositions the
> > > way the user does, we are simulating human behavior, the behavior that
> > > humans would use if the data were stored on paper forms in file
> > > cabinets. It is the way a person would intuitively work with the data.
> > > If we cannot simulate this with a computer, then why not? Again, it
> > > seems to work very well and from what I have seen, users using systems
> > > with this approach to NULL have fewer problems with querying and
> > > working with it. The system then "think like" they do.
> > >
> > > > And to think in terms of holes like this really isn't intuitive - for
> > > > example, say you're writing some XML document. You are hardly going to
> > > > create a tag with nothing in it, when you could have just left the tag
> > > > out altogether are you?
> > >
> > > If it is an XML document to back a report to the user, one might do so.
> > > If we are repeating the tags for each column with each row of data,
> > > then (ugly as that is), so be it. For a null value, I would think you
> > > would have <hairColor /> (or <hairColor></hairColor>) for that row in
> > > your XML.
> > >
> > > > The logic you are following is asking me to put
> > > > in that tag with an empty set inside it.
> > >
> > > Yes, or the shorthand version of that in XML.
> > >
> > > > Framed this way it seems like
> > > > madness (if still preferable to putting an sql-null in there).
> > >
> > > Ah, but it isn't, and likely results in less madness than the SQL
> > > approach.
> >
> > Again frustrating!
> >
> > * You've neglected the point that I was showing you it was
> > /unintuitive/, not possible as some bodge in a specific case.
>
> Perhaps I was not clear that I was disagreeing with you.
>
> > * This was nothing to do with SQL, it was about the uninuitiveness of
> > holes in propositions, why bring something orthogonal like SQL back in!
>
> You might notice that you brought it in (the sentence before I
> mentioned it), but I'll try to be more careful.
>
> > * I could not believe for a moment that someone (especially one who
> > refers to trying to stick good practices) would ever commend an
> > employee for generating like <person><cars>{}<cars></person> instead of
> > just having no tag at all.
>
> If there is a good schema document so that we have the metadata
> elsewhere, then skip overtly indicating missing data in the XML
> document. If this is the source for both data and metadata (which is
> how I was taking it), then there are instances when we would want to
> include all of the information the user would get (which could include
> an empty "cell" under a heading, for example). This is not a major
> point for me, so I'm good with dropping it.
>
> > > > > I was advocating migrating from the "legacy"
> > > > > Pick approach to and SQL-DBMS approach when I was hit over the head
> > > > > repeatedly with the fact that it was more intuitive for developers and
> > > > > end-users, resulting in fewer data and process defects when using null
> > > > > as empty set with 2VL than null as unknown with 3VL. I suspect there
> > > > > would be few, if any, folks who have worked with each who prefer 3VL
> > > > > and related relational decomposition.
> > > >
> > > > I do not see the coupling you make with 3VL and 6NF.
> > >
> > > I don't recall where 6NF was in this mix, but my understanding is that
> > > many (not all) put the elimination of nulls into 1NF and all subsequent
> > > NF's are built on top of that. I did not intend to couple 3VL with 6NF
> > > per se and I don't even recall that 6NF was in this discussion.
> >
> > The connection was that 6NF is the last word in relational
> > decomposition down to irreducible tuples. Let me restate then - I don't
> > see what 3VL has to do with relational decomposition at all, so I
> > thought the paragraph this references was yet another red-herring.
> >
> > The rest I will leave to a later post.
>
> My understanding of relational decomposition included the elimination
> of NULLs somewhere along the line (ignoring other non
> functional-dependency normalization rules that also trouble me, just
> zeroing in on this one). If that is not the case, then I did entirely
> miss on that. If I had that right, then in the case of using 3VL, it
> makes sense to eliminate NULLs (as I have done when modeling for
> SQL-DBMS's). This is not an issue with DBMS's that employ 2VL so that
> when I model for those systems I do not do that aspect of relational
> decomposition (unless I am misunderstanding "relational
> decomposition").
>
> cheers! --dawn
> P.S. Apologies again if I am missing your points, JOG, as I am trying
> to respond to the points presented. I try my best to bridge various
> gaps (including terminology as well as those in my brain).
Received on Mon Jan 22 2007 - 02:30:58 CET