Re: RM's Canonical database

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Mon, 03 Jul 2006 18:48:47 GMT
Message-ID: <jEdqg.5739$pu3.129718_at_ursa-nb00s0.nbnet.nb.ca>


Dan wrote:

> Bob Badour wrote:
>

>>guntermann_at_verizon.net wrote:
>>
>>>Bob Badour wrote:
>>>
>>>
>>>>Dan wrote:
>>>>
>>>>
>>>>>Frans Bouma wrote:
>>>>>
>>>>>
>>>>>>Bob Badour wrote:
>>>>>>
>>>>>>
>>>>>>>Ron Jeffries wrote:
>>>>>>>
>>>>>>>
>>>>>>>>On Sat, 01 Jul 2006 11:27:17 +0200, mAsterdam
>>>>>>>><mAsterdam_at_vrijdag.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>Robert Martin wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>... business rules don't belong in the database.
>>>>>>>>>
>>>>>>>>>What, in your opinion, does belong in the database?
>>>>>>>>
>>>>>>>>Uh ... data?
>>>>>>>
>>>>>>>'Data' is information represented suitably for machine processing. In
>>>>>>>what way are business rules not information or not represented
>>>>>>>suitably for machine processing?
>>>>>>
>>>>>>	Bob, are you now suggesting that you don't know the difference between
>>>>>>data and information? No don't bother looking up a Dijkstra quote on
>>>>>>that.
>>>>>>
>>>>>>		FB
>>>>

> [snip]
>
>>>>If I recall correctly, information and data are definitions 2382-01.01
>>>>and 2382-01.02 in that standard. In other words, in the view of the
>>>>folks who created the standard, they are the two most fundamental
>>>>definitions in our profession.
>>>
>>>This is but two of the definitions used for data and information in the
>>>computing sciences.  For example, a book I have, called "Information
>>>Technology - Inside and Out", by Cyganzski, Orr, and Vaz, use the
>>>definition of 'information' as:
>>>
>>>"Knowledge communicated or received concerning some fact or
>>>circumstance; news..".
>>
>>Which is the information theory definition and not the computing science
>>definition.

>
> The authors began with a standard dictionary definition.

If the dictionary doesn't spell out that information means a lot of different things in a lot of different contexts, it is not a good dictionary.

Within the context of the meaning of 'database' and within the context of a statement to the effect that one puts 'data' in a database, the only pertinent definition is the one I gave. You might note in this subthread that I am not the one who used 'data' in an unqualified manner without explaining exactly what I meant by it.

>>>They go on to state that the "world is full of facts, some discovered
>>>and some remaining to be discovered.  These become information when
>>>they are used in some way.  This is the fundamental connection between
>>>information and communication: a fact only becomes a useful as
>>>information when it is communicated."
>>>
>>>One could summarize these authors' distinctions between data and
>>>information as to whether or not it is communicable and received
>>>correctly, implying a process of encoding, transport, and
>>>interpretation by a receiver, whether human or non-human.
>>
>>And if we were talking about a telephone switching system, the
>>definition would be relevant. I thought I was very clear that
>>information theory and signal processing use different definitions. Was
>>I not clear enough?

>
> No, you were very clear Bob. I think the distinction might be somewhat
> artificial though for a variety of reasons. But that is merely my
> opinion.

Every human artifact is artificial including language and definitions. That said, the definition I gave was the only pertinent definition within the context of the subthread when I gave it. It was the only sensible interpretation of Frans' unqualified use of the term 'data'; although, he clearly was too ignorant and too full of shit to realise what he was saying.

The semantic information definition, the information theory definition and the signal processing definition had no sensible application to the statement that 'data' belong in a database:

A database is a passive set of facts. It does not communicate, and it has no signal to process. Further, because it is a set of facts and because facts have meaning, the semantic information definition that data has no meaning clearly had no application or useful interpretation.

>>>This is very similar to the vein of information theory.
>>
>>It is in fact identical to it.
>>
>>
>>>>>Business rules as logic can be represented symbolically, just as a
>>>>>natural language would do less efficiently, and then have manipulations
>>>>>of them mechanized by a computing system, just as facts as true
>>>>>prepositions are.  Why would the distinction between information and
>>>>>data come into play here?
>>>>
>>>>It comes into play as soon as one formally specifies a business rule in
>>>>a form suitable for machine processing. Before that moment, it is
>>>>information but not data. After that moment, it is both.
>>>
>>>I don't find this distinction as useful as others might, though I won't
>>>argue the fact that they might be entirely valid when stated as a
>>>definition within some well defined context.  It's just not the only
>>>definition, and I find others more useful.
>>
>>Within the context of newsgroups beginning with comp. (and especially
>>any relating to data), the definition I gave is the standard definition.
>>For one to use a different definition, such as the information theory
>>definition or the signal processing definition, one would have to state
>>the context explicitly (unless the context was already very clear.)

>
> I understand your definition and desire to use it Bob.

Do you see why it was the only sensible definition to apply to Frans' statement that 'data' belong in a database?

   But
> propositional and predicate logic never had to make this distinction in
> order to work with facts, information, and knowledge, so making the
> distinction now seems to muddy the water.

I disagree. Knowing the difference between data and information is very informative. Without that understanding, one will not really appreciate the difference between conceptual analysis and logical design. One will not really appreciate the concept of an external predicate either. One will not appreciate the inherent limition of all formalisms as expressed by Goedel's Incompleteness Theorem and why that limitation is not as limiting as some might suggest.

> I now deal with "information architects" and I am at a loss as to what
> role and value they provide in contrast to "data architects" or other
> computing specialties. They know very little about the data, but
> claim to know a lot about information. This simply doesn't make sense.

What they are depends on what they are doing. The term has applied in the past to analysts and designers of databases. Recently, the term relates much more to HCI and is a discipline combining psychology and taxonomy. If they are good, I think it is fairer to say they know a lot about how humans process and react to information signals, and they will nevertheless measure the results they get.

>>Because computing touches so many other fields, one must do this from
>>time to time. For example, if one is writing a median filter for a
>>graphics program, one might need to use the signal processing definition
>>wherein thermal noise is information.

>
> It is relevant to any interprocess communication, any interaction
> between modules, anything that requires an interface between two
> heterogenous entities, in fact. This spans both hardware and software
> and thus is relevant to IT and computing sciences as much as the
> definitions that you provide.

You are simply restating what I said above and offering another example context. I think it is fairer to say that communications and signal processing are (mostly) orthogonal to computing science and sometimes relate to implementations in computing science. When discussing communication or signal processing within the context of computing, the context is generally clear that one refers to communications or signal processing. When discussing information technology, as we were doing here, that context is likewise clear.

Similarly, graph theory is orthogonal (well, mostly orthogonal) to the relational model, but one can use graph theory in the analysis and design of a relational database.

>>However, within the context of this discussion, the definition I gave
>>above is the relevant definition and the standard definition. By failing
>>to even recognize it as a valid definition,

>
> Sure. I will accept that. I recognize it as a valid definition, but
> not as the only definition.

Then, I must ask: What do you hope to achieve by agreeing with me so vehemently?

>>>A theoritical treatise written originally in French might be full of
>>>information, but to me it does not constitute much information that is
>>>"processable" at all.  By the same token, a table in a document might
>>>contain data as facts, but not be necessarily in a 'digitized' form nor
>>>processable by a computing processor.  Your definition of data would
>>>exclude this as data and classify it only as information, but other
>>>definitions, including many dictionaries, would define those facts as
>>>data.
>>
>>I disagree. I suggest you try to get ahold of 2382-01 before making
>>absurd claims about the standard definitions for information technology.
>>If you can, try to peruse some of the other relevant documents from the
>>series.

>
> Bob, I will try to get ahold of the 2382-01 definitions, but I am not
> inclined to have someone else's perception of the one and only
> acceptable definition force fed down my throat.

Is it insecurity that makes you think anyone is trying to do that?

This is a topic where precision is important and where self-aggrandizing ignorants and snake-oil salesmen like Frans' make nonsense statements with great frequency and apparent conviction. If one is forced to hear them, I think the skill to identify their nonsense is an important skill to have.

In order to do that, one has to have a good grasp of all the definitions of the terms they use and when those definitions apply. Do you disagree?

   Too many make this
> claim of authoritative definition and I've learned not to accept such
> as face value.

Are you saying you have problems with authority? Knowing 1) the precise definitions of a word, 2) the precise contexts in which the various definitions apply and 3) the ways people frequently misuse a word give one power over authority and freedom from arbitrary authority.

   Frans's response seems to indicate a different take on
> the definitions and thus this reinforces my claim that the definition
> of data and information is not as definitive and self-evident as some
> may claim.

Frans has demonstrated repeatedly that he is a self-aggrandizing ignorant and a snake-oil salesman. His responses should in no way affect the thinking of intelligent, rational, educated people.

   The terms and the distinction between them are way too
> abused.

Absolutely, I agree. The semantic information definition appeals to managers who mostly come from sales, marketing or administrative backgrounds. While they don't really understand the philosophy either, the idea that data has no meaning lets them off the hook for not understanding computing science, data processing or information technology.

As a result, you will find the snake-oil salesmen similarly latch onto the semantic information definition to the exclusion of all others. After all, snake-oil salesmen are more interested in extracting money from the managers' budgets than anything else. Received on Mon Jul 03 2006 - 20:48:47 CEST

Original text of this message