Re: Reinventing the TransRelational Model?

From: paul c <toledobythesea_at_oohay.ac>
Date: Wed, 12 Apr 2006 16:36:40 GMT
Message-ID: <s0a%f.6055$7a.3904_at_pd7tw1no>


falcon wrote:
> I have been trying to understand the TransRelational model myself. I
> don't think simply having column based tables make the TR model.
> Sybase IQ has column based table. KDB has column based tables (Dennis
> Shasha calls them Arrables or array tables). MonetDB is not only based
> on 'columns' but there is an actual open source implementation
> (http://monetdb.cwi.nl).
>
> The most interesting research I have seen is something called DODO.
> There are some papers:
> --The Dodo Query Flattening System
> (http://www.ub.utwente.nl/webdocs/ctit/1/0000010d.pdf )
> --A general approach to query flattening
> (http://db.cs.utwente.nl/Publications/PaperStore/db-utwente-42D3A95D.pdf
> )
> There are some others which I can't find right now.
>
> -Falcon
>

I too suspect whatever it does is not entirely specified in the patents.

   I vaguely remember something to the effect of the "method" not being affected by different implementations that "somebody skilled in the art" might think of. Apparently (I read somewhere) the patent lawyers avoid words like 'algorithm' because those can't be patented.

Still from the patents, my limited understanding is that it certainly goes into more detail than just describing a column-based store. The zig-zag structure/method is certainly unusual but I don't know if it's novel. I also don't think that indexing all values is novel nor combining an index with links that replace the common row-based impl'ns.   When I looked at trm from the point-of-view of a memory-based 'store', as opposed to the disk-based strawman that was critized, I could see no way to avoid some pretty heavy updating overhead, unless a lot more pointer plumbing was used, plus a fair amount of churning for projections on tables with lots of columns. My desire was to minimize storage as well as to avoid a garbage collection component. Maybe there are clever ways to deal with the updating overhead that are beyond me but I concluded that for ten million rows and hundred million values and a bunch of btrees, i could get away with less than 5 byte storage overhead per value, getting the 'adjacency' effect for free, albeit at increased cost to add or delete columns.

(In the course of this I also had in mind to see if I could leverage the much-critized Intel segment feature, for example, if one's database is say, 200 MB in size, a pointer needs only 28 bits, not 32. From what I can understand, the current operating systems prevent one from using segments in appl'n code - what was originally a retrograde feature would be useful today if there was a little more foresight - too bad.)

I also noticed that Mr. Bayer, co-inventor of the btree which was never patented, has applied for a patent on his clever new 'all-column' (my words) index - i forget the name but it might be 'z+tree'. Where would we be today if he'd patented the btree or if Mr. Hoare had patented quicksort?

The ctree project seemed pretty derivative to me, for example it's occurred to lots of people to stick compression routines on top of indexes.

pc Received on Wed Apr 12 2006 - 18:36:40 CEST

Original text of this message