Re: Where is the TRM at?

From: paul c <toledobythesea_at_oohay.ac>
Date: Wed, 23 Nov 2005 19:33:54 GMT
Message-ID: <Cu3hf.589248$tl2.8179_at_pd7tw3no>


FrankHamersley wrote:
> With all this nullvel (sic) gazing I got to wondering where the "Trans
> Relational Model" is these days?
>
> Curiously it did seem much like the Sybase IQ product (on a superficial
> examination only) - accordingly does the "patent" have prior art problem
> lurking in its tea leaves?
>
> Cheers, Frank.

I don't know the answer and I wonder the same thing. I'm not entirely thick since I once was able to read a patent about an electrical device and could more or less follow it. However that patent was issued around 1917 and was only thirteen pages of large type with nice diagrams. I tried to read the TRM patents as well as CJ Date's simple example (which he noted was a partial implementation and not the general one) but most of it was lost on me. From the scanty info on the web, my guess is that there are probably money problems even if there aren't prior art problems.

Somebody (on this group, I think) criticized TRM (or at least the few published examples) for poor disk performance which was apparently disputed on the grounds that he used the wrong examples to come to his conclusion, if I got it right. But I think TRM is provocative in a useful way as well.

Ignoring what happens on disk, the two messages I managed to extract from looking at the TRM were that 1) the 'zig-zag' algorithms allow a kind of indexing on all attributes which should make many joins faster as well as reducing the number of sorts that might be needed in general and 2) there are two basic physical data structures, at least early on in the patents and in CJ Date's example, loosely one with values and one with pointers. So it seems that each 'column-row' intersection stores a value somewhere and a pointer somewhere.

This started me on planning out a simple-minded 'table' manager. I say 'table' instead of relation because although it would have some of the features of the relational model, it would necessarily have them all. It seems to me that one can make quite a useful memory-based db engine based on conventional indexing that in a steady-state would be just as compact, ie., the amount of memory needed would be roughly the same as in the TRM scheme. Admittedly, the TRM scheme is over my head, but I don't see that storing a table row-wise or 'image-wise' and having a separate index for each column is as feeble-minded as one might conclude from what's been written about TRM. After all, the row-wise scheme does keep the 'relation' between values in a row or tuple, for free. If the rows are all the same size, at least at some level, it also can make frequent operations such as row inserts and deletes fairly storage-efficient. Infrequent operations such as adding or removing columns would need memory working space beyond the minumum needed for any particular index and 'table' storage scheme. Regurgitating a 'table' would be quite easy and joins which are often such a bug-a-boo because the fear of their poor performance leads to bizarre schema designs would be relatively painless with all those indexes.

Taking a simple case, let all 'rows' be fixed-length, containing integers or floats whose length is at least the length of the pointer needed for each value. If you have a b-tree-style index that never stores values, just pointers, then the optimum overhead for such a db would be the number of 'intersection' values times the pointer size, say 4 bytes on a typical PC. Because memory latency is so insignificant when compared to disk latency, such an index doesn't really need to store values. Of course the blocks of memory would rarely be optimum, so you might want to say that the average overhead could be double that, say 8 bytes per value, or double the size of an integer on most PC's. Still, for double floats, it would be the same size. There would be some additional overhead for memory management, but if one is careful in choosing memory chunk sizes and how they are linked, I think this would amount to a fraction of a byte per intersection value. As for strings, they could be stored in a fixed-length row 'slot' but not necessarily, they could be 'heaped' and pointed to by a pointer in the fixed-length 'row' slot. The ratio of string index and string memory-structure overhead to the average size of the strings in an app might well be less than the ratio for integers and floats.

So, I'm imagining a db, say for some small application that has 100 tables, with average 10 attributes each, with average values about 10 bytes long, and say 10,000 rows per table. That would be 10 million rows altogether and at 100 bytes for row values, about 1GB of values. This particular mess would have something like a 100 million intersection values and at say 5 bytes of optimum pointer and memory management overhead per value or double that, say 10 bytes on average, we'd need about 2GB of memory. Such a db would almost fit in the application memory of something like Windows 2000, but it seems a stretch to me.

But what I've noticed over the years is that many useful db's are much smaller than this, in terms of actual data, especially some of the old-fashioned ones such as airline reservation systems, not to mention the typical video or grocery store or even many web apps such as password lookups. If I cut the above row count in half, I'm down to 1GB or so. If I cut my database down to 1 million rows across all tables, we're down to about 200 MB. Even if I add to this app some fairly big table with a million rows all by itself, I'm using about 400 MB. That's territory that many PC's have today. It's really the number of rows and attributes that matters for such a db, not the number of 'tables'. Try keeping 1 or 2 million rows in your Excel 'database'.

I still haven't said anything about concurrency, eg. locking and the interleaving of updates but I think this can be dealt with by some unconventional thinking about table and db constraints, allowing such an engine to be single-threaded. As for persistence, such a db would read everything in some form at start-up and write a redo-log during operation. Its goal would be to keep the cpu 100% busy, given sufficient user activity, minus whatever time the redologging kept it idle. I think garbage-collection can mostly be built into the basic algorithms. Maybe this would hurt concurrency in an app that is mostly update which is a good argument for developing it on a slow cpu!

I think the whole area of using memory in a different way is interesting and the idea of smaller applications doesn't bother me personally. In fact, I often wonder whether a bunch a small apps isn't a better way, project-management-wise, to get things done with fewer surprises.

Anyway, I know I didn't answer the question, but that's what looking at TRM made me think. Don't know how far I'll get, but I think it's a niche that's worth a try.

p Received on Wed Nov 23 2005 - 20:33:54 CET

Original text of this message