Re: Order & meaning in a proposition

From: Tony <andrewst_at_onetel.net.uk>
Date: 7 Apr 2004 05:02:41 -0700
Message-ID: <c0e3f26e.0404070402.333586fa_at_posting.google.com>


"Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message news:<c4umqn$eh5$1_at_news.netins.net>...
> "Tony" <andrewst_at_onetel.net.uk> wrote in message
> news:c0e3f26e.0404060507.f28c331_at_posting.google.com...
> > "Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
> news:<c4ss37$lfn$1_at_news.netins.net>...
> > > I broke down and bought the latest Date "An Introduction to Database
> > > Systems" text. It looks very comprehensive on the one hand and I look
> > > forward to reading it cover to cover, even though I have read some of it
> in
> > > prior versions.
> > >
> > > Without having read the entire book, there still seems to be an aspect
> > > missing that is integral to understanding data -- language. When taking
> a
> > > proposition and normalizing it for the purpose of modeling it, there are
> > > times when information is inadvertently lost or left behind because it
> is
> > > not critical. Sample proposition:
> > >
> > > Pat is the host who seated the President and the Secretary of the
> Interior
> > >
> > > If we have a relational model for this proposition, we will end up
> splitting
> > > this proposition up and will undoubtedly lose the order of those who
> were
> > > seated. If Pat seated others too, we will also lose the fact that these
> two
> > > seemed to have been seated together or in close proximity of time or
> place.
> > > There is nothing explicit about the ordering, nor is it considered
> > > important, perhaps, for our software application. However, there is an
> > > ordering here that is not arbitrary -- the President was listed first as
> an
> > > indication of the relative importance of the two who were seated. Even
> if
> > > Pat seated the Secretary of State later, it is likely relevant that such
> > > information is in a separate proposition from the one above.
> > >
> > > Once we split apart a proposition in such a way that we cannot get the
> > > original proposition back, even if we THINK we are getting the important
> > > aspects of it back, we have lost some of the meaning we intended to
> capture.
> > >
> > > This is an off-the-top-of-my-head example of where one might lose
> > > information when normalizing data and likely not a very good example
> > > compared to what might be lost in a typical business application.
> However,
> > > the point is that the process of normalizing data makes it sometimes
> > > impossible to retrieve the original propositions, thereby losing some
> > > information.
> > >
> > > A data modeling process that respects the integrity of the stored
> > > propositions so that they can be retrieved again has something going for
> it
> > > that the relational model lacks, it seems. Any thoughts?
> Thanks. --dawn
> >
> > Your example demonstrates exactly why language is a POOR way to
> > express a proposition, and hence should not be the basis of a data
> > model. You state that sentence and then claim that it also denotes
> > that:
> > a) The President was seated first, then the Secretary
> > b) The President and the Secretary were seated together (maybe)
> > c) The President and the Secretary were seated at the same time
> > (maybe)
> >
> > Well, if it was meant denote any or all of those things it is not at
> > all clear about it. In fact, these seem to be assumptions rather than
> > true inferences. For all I know it might form part of a longer
> > paragraph:
>
> You are suggesting that narrowing down language to some subset of what has
> been conveyed before we store the data is better. It might be better for
> some aspects of communication, but does not bring with it all of the
> richness of the language that is being modeled. What is conveyed in a
> proposition depends on both what is said and who is interpreting it and
> everything that hearer brings to the equation. The more we have the
> proposition stray from the original, the less it will have the same impact
> on the reader. Take this one step further and if someone writes up a
> document and we parse it apart, make a bunch of propositions from it, and
> generally fragment the document to the point where we cannot get that
> document back in the same way it came in, then we are apt to be conveying a
> different set of information than the original document conveyed.

Well of course it depends on what you are really trying to do. I would not try to "normalise" a Shakespeare play, a poem or even an email from a friend or colleague before storing its information to read later. But if I am trying to record unambiguous FACTS in a database, rather than received text with all its ambiguities intact, then I must first establish those facts and then record them unambiguously. If someone tells me that "fruit flies like a banana" then I need to determine (ask them even!) whether they are talking about the tastes of insects or the areodynamics of fruit.

> > "Pat is the host who seated the President and the Secretary of the
> > Interior. Pat seated the Secretary when he/she arrived at 8pm, and
> > the President half an hour later. The President sat at the top table
> > and the Secretary sat at the bottom table."
> >
> > Now if the facts are as you believe, then the propositions should be
> > stated as such, e.g.:
> >
> > Pat is the host
> > Pat seated the President at time T1 in location L1
> > Pat seated the Secretary of the Interior at time T2 in location L2
> >
> > That no longer reads like everyday language, because it is now being
> > precise about what it means, which everyday language does not as a
> > rule.
>
> But that lack of precision carries with it information too. Even if we do
> not "trap" the aspects of the language that are not precise, if we do our
> best to pass back the original propositions, in particular keeping the
> ordering of nouns in tact, then even if the software/dbms don't understand
> the subtleties, at least when we pass back the information to the reader we
> have not lost such meaning just because we decided to explode the words into
> an unordered structure.

Again, store the text as received if that is appropriate. But don't expect the DBMS to be able to infer any facts from it. Another fatuous example: if I am keeping a record of sales in a shop and am told "I just sold a kilo of bananas and apples", then it is important that I clarify whether we sold a kilo of each or half a kilo of each (or something else). For such purposes, if we deliberately allowed lack of precision in the database it would become useless.

Going back to your "Pat and the President" example, if that information is retrieved from the database in a year's time, even Pat may have trouble rembemering whether the implied order is a fact or not. Does the order matter, or doesn't it?

If you are arguing for keeping the source document AS WELL AS the derived facts, you have a valid point. For example, a good analyst who has established the business rule "each employee may have only one manager" as a fact will be able to reference a source document on which that is based - even if it is only his/her interview notes. Received on Wed Apr 07 2004 - 14:02:41 CEST

Original text of this message