Re: The TransRelational Model: Clarifications

From: Josh Hewitt <lajos.nagy_at_gmail.com>
Date: 16 Nov 2004 10:55:45 -0800
Message-ID: <1c92edeb.0411161055.326831e5_at_posting.google.com>

Alfredo Novoa <anovoa_at_ncs.es> wrote in message news:<sdvhp0hpmpgk0ughmjkolk7dc5onbqmok8_at_4ax.com>...

I am not familiar with the internal implementation techniques of main memory DBMSs so I do not have anything to say on the usefulness of the TransRelational Model (TRM) in that context. For this reason I will not address those remarks that assume that the majority of data is residing in memory.

Remark: I do not remember either Date or the patent ever mentioning that the TRM was intended primarily for main memory DBMSs. Can you point me to references where they do so? I must admit, I was fooled into beleiving that it can be used to implement a disk-based DBMS.

> >First of all, my analysis is just a preliminary, back-of-napkin type
> >calculation that I did in order to establish the feasibility (or lack
> >thereof) of implementing the relational model using the TRM.
>
> Then why you gave striking conclusions if the analisys was incomplete?
>

Because of the striking results. 'back-of-napkin' implies that whatever we are looking at should not look _too_ bad on the back of the napkin so it might be worth our further efforts.

> > My
> >reasoning was that any new proposal for storage organization should prove
> >itself on some simple examples before checking it against more complex
> >ones.
>
> Should prove what?
>
> Even if a new proposal is significantly worse by a constant factor on
> the simplest cases, it could be very interesting whether it is a lot
> better on the complex cases where the traditional approaches perform
> horribly.
>
> Most practicioers tend to avoid complex queries due to the performance
> issues. An implementation which makes performance independent to the
> complexity of the queries would be an important advance.

Based on your reasoning, we could index all attributes of each table in Oracle because that would speed up complex queries. Sure insertion, update, and deletion will be a bit sluggish, but who cares? Complex queries will _fly_! I hope you do understand the ironic nature of my remark. I want to bring to your attention the little but important fact that any storage organization should be usable for performing _simple_ queries too. Mainly because even joins use restriction on the result set and restriction on attribute values is called lookup. Ah, and joins also must reconstruct the records they want to return so it is useful if record reconstruction is not too slow, isn't it?

> >In fact, the two basic functions that I put to a test in my analysis
> >(lookup and record-reconstruction - RR) are such fundamental elements
> >that very few useful queries can be formulated without them.
>
> But you only analize one of the many possible embodiments.

The one that is described in the patent and advertised by Date. I am not the inventor nor a developer of the TRM. I just analyzed what had been proposed. Nevertheless, I _did_ try to come up with 'fixes' that did not require a complete overhaul of the TRM. (Because after an overhaul it would not be the TRM anymore.)

> The queries of your examples don't need a reconstruction table at all
> because you restrict over a single attribute.

EMP WHERE ENAME = 'Johnson'

will return a result set that contains all attributes of the EMP relation. I restrict on ENAME but I _do_ want to see all the attributes of each tuple in the result set. This is why you need the record reconstruction (RR) table.

> > Actually, I
> >cannot think of a single query that does not involve lookup and/or RR.
>
> -Queries on tables with merged columns.

True.

> -Queries on unidimensional tables.
> -Queries on tables with only one sorted attribute.

RR does not go away in the cases that you mention. It just becomes a trivial step. It is like a full table scan on a 'traditional' relation. Each attribute of a record can be found on the same disk block, so no extra effort is needed for RR. Think about it: RR simply means the step in the query execution where you actually get the attribute values of the tuple that you need for the query. It cannot be avoided in the overwhelming majority of queries.

> >The performance of lookup and RR cut to the heart of any storage
> >organization so negative results in this area are not something that
> >can be handled lightly.
>
> Unproved statement.
>

Unproved statement :)

> >Now this is a very good observation. In fact, closer examination of
> >the question of indexing reveals an interesting feature (?) of the
> >TRM: that the Field-Value tables _cannot_ easily use B-Trees or
> >Hashing for speeding up searches.
>
> Very vague. We could say the same about the traditional
> implementations. DBMS development was never an easy job.
>
> But to use the hashing technique described in the patent would be very
> easy.
>
> > In fact, the core ideas of the TRM
> >hinge on the positional addressing (meaning that they behave like
> >arrays) capability of the FV and RR tables.
>
> So what?
>
> We can build trees "indexed" by position.

Maybe you should read those passages again that deal with the problems that arise when you try to use a B-Tree index for the columns in the Field-Values (FV) tables. The biggest problem is the linked list nature of the RR table. When you change the position of some attribute value in one column of the FV table you have to update the links in the 'previous' column in the RR table so they point to the new positions.

This is the problem.

Until you show me a solution to the 'moving-values-break-links' problem I am going to stick to my assumption that the FV and RR tables cannot use B-Trees (or Hashing, or _any_ indexing technique that keeps changing the positions of the keys.) For this very same reason I will not address any counter-arguments that assume the usage of B-Trees for indexing FV tables (or RR tables for that matter.)

> In many if not most practical cases the most part of the queries only
> affect to a little part of the relations, and there are many
> attributes that don't need to be indexed (or sorted).

Actually, in my original analysis the TRM was competing against a traditional DBMS in which the EMP relation was indexed on E# and ENAME. If many "attributes don't need to be indexed" then what is the big deal about the TRM? The traditional approach is pretty fast (in fact, much faster than the TRM) if lookups are based only on the indexed attributes.

I hear you say 'JOINS!'. But have you noticed that index based joins are pretty efficient in traditional DBMSs too? Oh, and they do not have to store the entire relation in memory :)

> >is just an _additional_ cost. In the example I gave I analyzed the
> >insertion of a new Employee record into the EMP relation. Since an
> >Employee record has eight different attributes, the TRM has to insert
> >_each and every_ attribute value into its corresponding sorted set.
>
> Wrong again. This would be comparable to a traditional approach with
> all the attributes indexed, with several multi-attribute indices.

Date often referred to the TRM as an implementation that keeps "all attributes simultaneously sorted". So maybe this is not the case after all? Maybe?

> >The cost of performing this search is in itself prohibitive:
> >
> > ACNT * log (|EMP|/ACNT) = 8 * log (125) = 8 * 7 = 56 blocks
>
> This is not prohibitive. Only slower by a little constant factor.
>
> It would be compensated by an orders of magnitude gain in complex
> queries.

That is interesting. Complex queries give us negative time so we can re-gain what we have lost during inserts,updates and deletions? Again, why not put an index on all attributes in Oracle?

> >No matter what indexing technique the TRM chooses to use it will have
> >to perform a search followed by an insert for _each and every_
> >attribute value.
>
> No, the number of attributes to "sort" is optional, you might sort 0
> attributes and then you would have something very similar to a
> traditional table.

I rest my case.

Best Regards,

Josh Hewitt
Ph.D. Student
Florida Institute of Technology Received on Tue Nov 16 2004 - 19:55:45 CET

This message: [ Message body ]
Next message: Kenneth Downs: "Re: -> Database Relations for e commerce product catalogue"
Previous message: erk: "Re: Issues with the logical consistency of The Third Manifesto"
Maybe in reply to: Josh Hewitt: "The TransRelational Model: Clarifications"
In reply to Alfredo Novoa: "Re: The TransRelational Model: Clarifications"
Next in thread: Neo: "Re: Demo: Modelling Cost of Travel Paths Between Towns"
Reply: Alfredo Novoa: "Re: The TransRelational Model: Clarifications"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message