Re: A Normalization Question

From: Marshall Spight <mspight_at_dnai.com>
Date: Wed, 14 Jul 2004 20:09:34 GMT
Message-ID: <1qgJc.80478$IQ4.24920_at_attbi_s02>


"Neo" <neo55592_at_hotmail.com> wrote in message news:4b45d3ad.0407140826.3ab81640_at_posting.google.com... >
> I claim the string 'brown' is a fact.

The string "brown" is a value. A value is not the same thing as a fact. For something to be a fact, it has to exist as part of proposition, with an associated predicate. If the supposed predicate is "X is a member of a set" we do not call that a fact, but simply a value of a given type.

> You say its
> not. You say "A fact is a true proposition." I say the string 'brown'
> is a true proposition that being: The string 'brown' is composed of
> the symbols 'b', 'r', 'o', 'w' and 'n' in that order.

That would make its predicate:

"X is composed of the symbols A, B, C, D, and E in that order."

which is redundant and denormalized, because X -> A, B, C, D, E. So you can simply eliminate those variables from the predicate. Then the predicate becomes "X is composed of symbols" which is true for every string, so the predicate becomes "X is a string."

If we are to accept your definition of fact, we would have to conclude that every value is a fact, which is to say, every value is true. Let us consider the boolean domain, with its two values, true and false. (I hope you are able to distinguish "true"-the-string from true-the-boolean.) We have claimed that every value is true; we must therefor assert that the value false is true. False is not true, therefore our assumption that every value is true is false.

RAA. Your confusion lies in the difference between values and variables, the classic that Date&Darwen often speak of. This is also the same thing as the "object identity / object equality" issue of OOP.

In a proposition/fact/table/struct/object/whatever, we have to consider the ramifications of different choices as to how we represent attributes. We can have the attribute value stored directly within the thing, or else we can have a reference to somewhere else. There are two cases to consider: the reference refers to a variable, and the reference refers to a constant. If the two attributes are in fact the same attribute, then we can have references in two places that each refer to the same variable. If they are different, we cannot do this, because to change one would mean changing the other, and that would break the idea that they are different.

We could also have the two references each refer into a constant pool, in which case they are conceptually references to a value. In this case, we have to consider the reference itself to be the value, because we cannot update the constant. If we want to update the man's name, we have to change the reference, or else we are changing the house's color at the same time.

There is a modest data-compression value to be had in collapsing the storage of values into a constant pool, assuming the values are larger than the references. Java calls this "string interning" and does it at compile time.

Note this issue is independent of the choice of data model; it occurs in C++ and Java, and assembly language for that matter.

Also note that exposing this issue at the logical level is unnecessary. Which is not to say that doing so is a disaster; both C++ and Java do it. But given that it adds no expressiveness or efficiency (since it can be done "under the covers" anyway) and adds considerable complexity to the logical level, any model that does expose this is inferior to one that doesn't.

Marshall Received on Wed Jul 14 2004 - 22:09:34 CEST

Original text of this message