Re: Demo: Modelling Cost of Travel Paths Between Towns

From: Neo <neo55592_at_hotmail.com>
Date: 16 Nov 2004 11:47:27 -0800
Message-ID: <4b45d3ad.0411161147.5122707c_at_posting.google.com>


> > redundant strings: Modifying one will corrupt your data.
>
> Okay, so to prove that storing the same string twice is not redundant, ...

Are you reading what you are writing? Of course representing (storing at logical level) the same string twice is redundant. Isn't that simply obvious?

> ...example:
>
> Mary likes John
> Paula likes John
>
> .. [Mary now like Paul not John]...
>
> Mary likes Paul
> Paula likes John
>
> I modified one. My data is not corrupt.

First, above example is deficient in that a program could not guarantee that John in line 1 and 2 are the same. You shouldn't rely on matching strings (or have you forgotten the lesson of the same flaw in your RM Solution #1 for Common Ancestor Report). Much like towns in Celko's example, persons in your example needs to be in a separate table to avoid redundancy.

Assuming, you meant to keep the example simple and that John on line 1 and 2 are the same person, then you do have redundant data (the person's name). For example, after having entered the data, you realize that John's name is really spelled Johnn. Changing the first John to Johnn, corrupts the data. A program can no longer determine that Johnn on the first line is the same as John on the second line.

The reason your example is flawed is because you did not modify one of the redundant John's, you modified a relation involving John ("Mary like John" to "Mary like Paul").

> The RM doesn't believe in "one model fits all".

Then you are wiser as some RM zealots here have been propogating to the contrary for years.

> If a customer's business requires operations on a symbol
> (or rather: > character) level, the RM is quite capable of handling it.

I would say RM is capable of it, but it is impractical. All TM/XDb2 dbs are normalized to individual symbols (just by entering simple scripts). Nearly all RM dbs aren't. Would you be willing to script/query some examples to prove or disprove this assertion? For example find all things named John in a normalized/null-less db that might contain persons, horses, robots, etc. With XDb2, the query is "%.name='john'".

> However, I have not yet encountered any such business.

Then you probably haven't dealt with the business of AI type applications. While quite an important scope, business applications aren't the entire scope.

> > [Rhetorical] Why can't one represent the symbol X the same way as
> > representing the person john in RM?
>
> One can: CREATE TABLE Symbols (symbol char(1) NOT NULL PRIMARY KEY)

RM can as you show above which is the same method I showed in OT "A Normalization Question". It's not that RM can't, but RMer willingly shortcut true relational methodology because it is too cumbersome with little benefit for their scope.

> John is a thing (or rather: a person). 'John', 'X', '-> John', 1FB54A and
> 110010100111010011001 are all references to that thing (using different
> referencing schemes), not the thing itself.

The flaw in your understanding of normalization is in the above paragraph. True, the thing in the db representing John is not the same as the real John. In TM/XDb2, there is only one thing within the db that represents the person John. Then there are as many data-independent references to that one John (within db) as needed. In your simple example, you have represented John twice with no guaranteed mechanism to tie them together. There is no strong relationship tying the two together as does a data-independent ref/id/link. You have already demonstrated the weakness/inflexibility of using strings as refs in your RM Sol#1. The ref should be completely independent of the thing being represented (ie don't use its name) and the ref should be mostly hidden from the user (ie can you find one in XDb2 scripts?).

The problem with multiple things in a db representing the real John (ie 'John' and 'John') is how does a program guarantee their synchronization. You can't guarantee it without data-independent refs to the original in db. This is why I can corrupt your data by changing the first John to Johnn.

> You still have trouble understanding the basic fact that redundancy in the
> relational model is not about storing _references_ to a "thing" twice, but
> about storing FACTS about "things" twice. Until you understand that basic
> fact, you'll never understand basic normalization.

Funny, you know it, but don't know it. Your explanation is exactly why you have redundant data, because you are storing facts about things twice and not a reference to the original fact in db there after. In your simple example, you stored john (a fact) twice. The second john does not have a ref to the original (which should be in T_Person). If the second John had a ref to the original John, then changing the original John to Johnn does not corrupt the db. Received on Tue Nov 16 2004 - 20:47:27 CET

Original text of this message