Re: The TransRelational Model: Clarifications

From: Alfredo Novoa <alfredo_at_ncs.es>
Date: Wed, 17 Nov 2004 16:28:43 GMT
Message-ID: <419b7c39.22593812_at_news.wanadoo.es>


On 16 Nov 2004 10:55:45 -0800, lajos.nagy_at_gmail.com (Josh Hewitt) wrote:

>I am not familiar with the internal implementation techniques of
>main memory DBMSs so I do not have anything to say on the usefulness
>of the TransRelational Model (TRM) in that context.

You should be. Main memory DBMSs are becoming a serious alternative for many projects. RAM storage capacity follows Moore's law.

>Remark: I do not remember either Date or the patent ever mentioning
>that the TRM was intended primarily for main memory DBMSs.

Perhaps it was intended primarily for main memory DBMSs, but Date said that it was extended to be used efficiently with disk based DBMSs.

The brief description of the TRM that Date gives in his book does not address the issues related to the disk based DBMSs, but he is going to publish a whole book about the TRM that covers all that stuff.

> Can you
>point me to references where they do so? I must admit, I was fooled
>into beleiving that it can be used to implement a disk-based DBMS.

I reviewed the appendix yesterday and it is clearly explained in the last page of the appendix. Page 996 if I remember correctly. This should clarify many things. I recomend you to read the whole page paying attention.

>Based on your reasoning, we could index all attributes of each table
>in Oracle because that would speed up complex queries.

Indeed, but we still would have many performance problems with many complex queries. The TRM structures are a lot better than the traditional indices for complex queries.

> Complex
>queries will _fly_!

No, only a narrow range of the queries. The TRM is a lot better for that.

> I hope you do understand the ironic nature of my
>remark. I want to bring to your attention the little but important
>fact that any storage organization should be usable for performing
>_simple_ queries too.

But you are comparing a traditional database that only has the indices you need with a TRM database which has all the attributes "indexed". This is not a fair comparation.

If you read the patent, you will see that the columns that are not needed for lookups might be stored with the traditional approach.

>The one that is described in the patent and advertised by Date.

The patent says explicity that there are many possible embodiments and it also describes the case of the "unsorted" columns.

> I am
>not the inventor nor a developer of the TRM. I just analyzed what had been
>proposed.

But Date says explicitly that issues you are reporting are solved and the solution was implemented. You are reinventing the wheel.

>> The queries of your examples don't need a reconstruction table at all
>> because you restrict over a single attribute.
>
>EMP WHERE ENAME = 'Johnson'
>
>will return a result set that contains all attributes of the EMP
>relation. I restrict on ENAME but I _do_ want to see all the attributes
>of each tuple in the result set. This is why you need the record
>reconstruction (RR) table.

No, because you might store all the attributes in the same value table:

ROW DATA

---    ------------------------------------
  1    Hewitt         40    1998       9000
  2    Johnson        30    1993       5000
  3    Johnson        50    1990       7000
  4    Smith          20    2000       6000

BTW with this structure we could restrict over several multi-attribute combinations very easily:

EMP WHERE ENAME = 'Johnson' AND HIREYEAR > 1992

E# ENAME HIREYEAR SALARY
-- --------- -------- ------
30 Johnson 1993 5000

This is only one of the many many options that the TRM allows.

>> > Actually, I
>> >cannot think of a single query that does not involve lookup and/or RR.
>>
>> -Queries on tables with merged columns.
>
>True.
>
>> -Queries on unidimensional tables.
>> -Queries on tables with only one sorted attribute.
>
>RR does not go away in the cases that you mention.

It does, of course. If you have a relation with a single attribute it is evident that you don't need an RR table.

This is the same with the example I put below.

>> >The performance of lookup and RR cut to the heart of any storage
>> >organization so negative results in this area are not something that
>> >can be handled lightly.
>>
>> Unproved statement.
>>
>
>Unproved statement :)

Indeed, but I hope that the proof (the existing implementation mentioned by Date) will be avaiable soon.

>> We can build trees "indexed" by position.
>
>Maybe you should read those passages again that deal with the problems that
>arise when you try to use a B-Tree index for the columns in the
>Field-Values (FV) tables. The biggest problem is the linked list nature
>of the RR table. When you change the position of some attribute value
>in one column of the FV table you have to update the links in the
>'previous' column in the RR table so they point to the new positions.

I know, I have implemented that, but I have found a way to make a very eficient update using trees.

Perhaps the inventors of the TRM have even better solutions than mine.   

>This is the problem.
>
>Until you show me a solution to the 'moving-values-break-links' problem
>I am going to stick to my assumption that the FV and RR tables cannot
>use B-Trees (or Hashing, or _any_ indexing technique that keeps changing
>the positions of the keys.)

An easy solution is to store an offset value with each block. If you want to displace all the pointers of a branch of the tree, you only need to modify the offset value of that node.

If we insert a new value we only need to phisically displace the pointers of a single block and to propagate the offset displacements over the tree.

This is a rather basic exercice with trees.

> For this very same reason I will not address
>any counter-arguments that assume the usage of B-Trees for indexing
>FV tables (or RR tables for that matter.)

Until now, I hope :)

>Actually, in my original analysis the TRM was competing against a
>traditional DBMS in which the EMP relation was indexed on E# and
>ENAME.
To compete with that you only need to sort the E# and ENAME attributes, and a possible option is to duplicate the table and to have one table only "indexed" by #E and the other by ENAME.

>If many "attributes don't need to be indexed" then what is
>the big deal about the TRM?

The TRM is not very interesting if you only need to make trivial queries like these, but it does not have big problems with them anyway. The flexibility of the TRM allows many different solutions.

>> Wrong again. This would be comparable to a traditional approach with
>> all the attributes indexed, with several multi-attribute indices.
>
>Date often referred to the TRM as an implementation that keeps
>"all attributes simultaneously sorted". So maybe this is not the case
>after all? Maybe?

Date's appendix is only a very brief introduction and not a detailed analisys. It only covers a very little part of the TRM, and it warns about that in a very explicit way.  

>> It would be compensated by an orders of magnitude gain in complex
>> queries.
>
>That is interesting. Complex queries give us negative time so we can
>re-gain what we have lost during inserts,updates and deletions?

Time Traditional TRM
------ ----------- ---

Query        1000        10 
Insert         10       100
------    -----------   ---
Average       505        55

>Again,
>why not put an index on all attributes in Oracle?

Because a well designed TRM structure would be a lot more efficient in terms of space and performance than an Oracle database plenty of indexes.

Regards Received on Wed Nov 17 2004 - 17:28:43 CET

Original text of this message