Re: The TransRelational Model: Clarifications

From: Josh Hewitt <lajos.nagy_at_gmail.com>
Date: 19 Nov 2004 11:59:15 -0800
Message-ID: <1c92edeb.0411191159.1a6313ba_at_posting.google.com>


alfredo_at_ncs.es (Alfredo Novoa) wrote in message news:<419b7c39.22593812_at_news.wanadoo.es>...

> >Remark: I do not remember either Date or the patent ever mentioning
> >that the TRM was intended primarily for main memory DBMSs.
>
> Perhaps it was intended primarily for main memory DBMSs, but Date said
> that it was extended to be used efficiently with disk based DBMSs.
>
> The brief description of the TRM that Date gives in his book does not
> address the issues related to the disk based DBMSs, but he is going to
> publish a whole book about the TRM that covers all that stuff.
>
> > Can you
> >point me to references where they do so? I must admit, I was fooled
> >into beleiving that it can be used to implement a disk-based DBMS.
>
> I reviewed the appendix yesterday and it is clearly explained in the
> last page of the appendix. Page 996 if I remember correctly. This
> should clarify many things. I recomend you to read the whole page
> paying attention.

Yeah, you are right. Actually I received an email from Fabian Pascal in which he confirmed (quoting Chris Date) that the algorithms and data structures described in the appendix (and also in the patent) are "for in memory/no updates" only. (However, maybe this 'feature' should be stressed a bit more when advocating the TRM.)

At least the analysis gives us clues as to why this might be the case.

This makes the discussion sort of moot but I would like to address one more thing before the topic is closed. It has to do with B-Trees.

> >> We can build trees "indexed" by position.
> >
> >Maybe you should read those passages again that deal with the problems that
> >arise when you try to use a B-Tree index for the columns in the
> >Field-Values (FV) tables. The biggest problem is the linked list nature
> >of the RR table. When you change the position of some attribute value
> >in one column of the FV table you have to update the links in the
> >'previous' column in the RR table so they point to the new positions.
>
> I know, I have implemented that, but I have found a way to make a very
> eficient update using trees.
>
> Perhaps the inventors of the TRM have even better solutions than mine.
>
> >This is the problem.
> >
> >Until you show me a solution to the 'moving-values-break-links' problem
> >I am going to stick to my assumption that the FV and RR tables cannot
> >use B-Trees (or Hashing, or _any_ indexing technique that keeps changing
> >the positions of the keys.)
>
> An easy solution is to store an offset value with each block. If you
> want to displace all the pointers of a branch of the tree, you only
> need to modify the offset value of that node.
>
> If we insert a new value we only need to phisically displace the
> pointers of a single block and to propagate the offset displacements
> over the tree.
>
> This is a rather basic exercice with trees.

Been there, done that.

(Interestingly, I came up with the exact same solution when I first tried to solve the problem.)

It was pointed out that positional links could pose problems if attribute values in the SALARY column change their positions. In order to maintain the integrity of the RR table, links (in this case, positions) in the 'previous' column must be updated so they point to the new (correct) positions.

In your solution you suggest using 'offset' labels that we can attach to nodes of the B-Tree. These offsets can be propagated down the tree. For example, when you insert a new value into the SALARY column all values that come after the new value must be shifted by one. You plan to achieve this by attaching an offset of, say, 1 to the subtree that contains all the values that are larger than the new value. This way, positional links in the previous column that pointed to the nth value will now point to the (n + 1)th value (of course, I mean only those links that pointed to values that are larger than the newly inserted value.) We have a solution, right?

Wrong.

Unfortunately, the solution does not work. Let me show you why. Take the SALARY column. Let's suppose we have two values 6000 and 7000 with their positions in the sorted order being 10 and 11 respectively. Because of the linked list nature of the RR table the 'previous' column, say HIREYEAR, will contain these two positions. This way records can be re-constructed.

Now, let's insert the value 6500 into the SALARY column. The new value comes between 6000 and 7000 so its position will be 11. You add an offset of 1 to the branch of the B-Tree that contains the values that are larger than 6500 so that 7000 will have a position of 12 from now on. The question is: what position (link) will you insert into the previous column, say HIREYEAR, of the RR table for the new value 6500? 11? But there is already a position 11 in the HIREYEAR column! It is the old link to the value 7000. You cannot have two 11's in the HIREYEAR column so you have to do something about it. What can you do? Update the 'old' 11 to 12? But there is already a 12 in HIREYEAR! Update that too? Well?

The problem is that offsetting can shift positions but cannot change the _distance_ between two positional links! This is where your solution breaks down. If I have the links 10 and 11 in the HIREYEAR column in the RR table you cannot insert anything between them without doing some serious updating of the links. (Which is why I said before that B-Trees cannot be used for indexing the columns of the FV table.)

The only possible fix is to leave 'gaps' between positional links so that when you insert a new value it can always be inserted between two old ones. Unfortunately, this 'fix' kills the nice property of links that they are positional, so array-based access is possible. But this means that you need a separate mapping structure that maps links to attribute values and positions in the RR table. And this mapping structure must be there for each and every sorted attribute. If you think about it the whole structure begins to resemble a traditional relational table with a bunch of B-Tree indexes defined on it. (They call it inverted table because we store the attribute values only once, in the B-Tree indexes.)

Best Regards,

Josh Hewitt
Ph.D. Student
Florida Institute of Technology Received on Fri Nov 19 2004 - 20:59:15 CET

Original text of this message