Re: Demo: Modelling Cost of Travel Paths Between Towns

From: Neo <neo55592_at_hotmail.com>
Date: 17 Nov 2004 12:16:22 -0800
Message-ID: <4b45d3ad.0411171216.5586f1dc_at_posting.google.com>


> If the correct update strategy is chosen, the data doesn't become corrupt.

Based on our discussions, I now realize that the above aspect of my argument was off the mark. Thanks. I clarify below using your example:

Mary likes John
Paula likes John

Assuming user already knows John, Mary, Paula and likes, the above is the correct way for a user to express/view the two relations, in particular that he shouldn't need to express/view "Mary likes ->John" or "Paula likes ->John". In fact, that is how XDb2's script handles it as shown below:

CREATE Mary.like = John;
CREATE Mary.like = John;

The problem is how the above things are actually represented in the db, and in particular how they are linked and the resulting consequences on flexibility and not as much on the ability to avoid data corruption. In RM, John in the two relations are linked by the data-dependent string 'John'. In TM/XDb2, they are linked to the original John (created earlier) by data-independent references to which the user is oblivious as seen in the two CREATE statements above (and whether the two data-independent refs have same or different values, wires, transistors is an irrelevant implementation issue).

Instead of saying redundant data leads to corruption, I should have been saying, with synchronization mechanisms turned off, a way to check for redundant data is to modify one John (ie to Johnn) and see if the data becomes unsynchronized. If data does become unsynchronized then one has redundant data. And you are correct that if one wanted to change John to Johnn, there would be no corruption, as long as a synchronization mechanism updated the second John to Johnn based on the fact the two were originally linked by the string 'John'.

I should not have been emphasizing the issue of keeping redundant data synchronized. That was my mistake. Instead, the significant consequence is the lack of flexibility that results from using data-dependent refs. Because your RM Solution #1 for Common Ancestor Report and Celko's original RM Solution for Travel Paths both used data-dependent refs to link things, they both lacked flexibility to handle certain cases. For example neither solution could handle multiple things with the same name. You and Alan had to change respective schemas and supporting code/queries to handle the new case. Redundancy and thus data-dependent refs limits flexibility. That is why your RM Sol# 2 for the Report can't handle the two cases while XDb1 can. RM#2 still has other redundancies (and you'll never remove them all with RM, unless you resort to extreme generic modelling).

In most applications, the degree of flexibility required is largely known at design-time so significant schema/code changes are not very likely later. This is why you keep asking me for exact application specs before starting. However in AI-type applications, the degree of the flexibility is largely unknown at design time, and the unanticipated is to be anticipated. For AI-type applications, the db should be able to handle almost any case in the future without requiring R2D2 to be hauled back to the software lab for new schema and code. This is why I can't give you exact specs because I want your system to handle any cases I can throw at it in the future.

> In this message, I try (as prompted by you) to enter a person and a dog,
> both named 'john', in XDb1. It fail[ed].

True, NLI had programming errors, while the GUI and API didn't with respect to above feature. This is not very relevant to the fact that XDb1 represents things without redundancy and uses data-independent refs vs your RM Solutions did neither.  

> > > > In your example, you stored john (a fact) twice.
> > >
> > > You call 'john' a fact?????
> >
> > Yes, 'john' is a fact. In fact, a, b, c ... are facts.
>
> All my arguments to the contrary are here:
> http://dictionary.reference.com/search?q=fact

I will accept that 'john' or 'j' may not meet your or Webster's definition of a fact, which is irrelevant since representing a thing (ie "john like mary", john, 'john', 'j') twice is redundant, typically leads to data-dependent refs and limits the flexibility to represent things in the future as was the case with yours and Celko's RM solutions. Received on Wed Nov 17 2004 - 21:16:22 CET

Original text of this message