Re: Where is the TRM at?

From: Frank Hamersley <terabitemightbe_at_bigpond.com>
Date: Thu, 24 Nov 2005 01:29:06 GMT
Message-ID: <CH8hf.2359$ea6.2112_at_news-server.bigpond.net.au>


paul c wrote:
> FrankHamersley wrote:
>
>> With all this nullvel (sic) gazing I got to wondering where the "Trans
>> Relational Model" is these days?
>>
>> Curiously it did seem much like the Sybase IQ product (on a
>> superficial examination only) - accordingly does the "patent" have
>> prior art problem lurking in its tea leaves?
>>
>> Cheers, Frank.
>
> I don't know the answer and I wonder the same thing. I'm not entirely
> thick since I once was able to read a patent about an electrical device
> and could more or less follow it. However that patent was issued around
> 1917 and was only thirteen pages of large type with nice diagrams. I
> tried to read the TRM patents as well as CJ Date's simple example (which
> he noted was a partial implementation and not the general one) but most
> of it was lost on me. From the scanty info on the web, my guess is that
> there are probably money problems even if there aren't prior art problems.
>

G'day Paul, Sadly I haven't got the time leading up to the weekend to respond adequately to your comprehensive post! I will address it a bit later, except to state that I my interest was majorly piqued by how much reputation Date and Darwen staked on the TRM being the next big thing!

Cheers, Frank.

> Somebody (on this group, I think) criticized TRM (or at least the few
> published examples) for poor disk performance which was apparently
> disputed on the grounds that he used the wrong examples to come to his
> conclusion, if I got it right. But I think TRM is provocative in a
> useful way as well.
>
>
> Ignoring what happens on disk, the two messages I managed to extract
> from looking at the TRM were that 1) the 'zig-zag' algorithms allow a
> kind of indexing on all attributes which should make many joins faster
> as well as reducing the number of sorts that might be needed in general
> and 2) there are two basic physical data structures, at least early on
> in the patents and in CJ Date's example, loosely one with values and one
> with pointers. So it seems that each 'column-row' intersection stores a
> value somewhere and a pointer somewhere.
>
>
> This started me on planning out a simple-minded 'table' manager. I say
> 'table' instead of relation because although it would have some of the
> features of the relational model, it would necessarily have them all. It
> seems to me that one can make quite a useful memory-based db engine
> based on conventional indexing that in a steady-state would be just as
> compact, ie., the amount of memory needed would be roughly the same as
> in the TRM scheme. Admittedly, the TRM scheme is over my head, but I
> don't see that storing a table row-wise or 'image-wise' and having a
> separate index for each column is as feeble-minded as one might conclude
> from what's been written about TRM. After all, the row-wise scheme does
> keep the 'relation' between values in a row or tuple, for free. If the
> rows are all the same size, at least at some level, it also can make
> frequent operations such as row inserts and deletes fairly
> storage-efficient. Infrequent operations such as adding or removing
> columns would need memory working space beyond the minumum needed for
> any particular index and 'table' storage scheme. Regurgitating a
> 'table' would be quite easy and joins which are often such a bug-a-boo
> because the fear of their poor performance leads to bizarre schema
> designs would be relatively painless with all those indexes.
>
>
> Taking a simple case, let all 'rows' be fixed-length, containing
> integers or floats whose length is at least the length of the pointer
> needed for each value. If you have a b-tree-style index that never
> stores values, just pointers, then the optimum overhead for such a db
> would be the number of 'intersection' values times the pointer size, say
> 4 bytes on a typical PC. Because memory latency is so insignificant
> when compared to disk latency, such an index doesn't really need to
> store values. Of course the blocks of memory would rarely be optimum,
> so you might want to say that the average overhead could be double that,
> say 8 bytes per value, or double the size of an integer on most PC's.
> Still, for double floats, it would be the same size. There would be
> some additional overhead for memory management, but if one is careful in
> choosing memory chunk sizes and how they are linked, I think this would
> amount to a fraction of a byte per intersection value. As for strings,
> they could be stored in a fixed-length row 'slot' but not necessarily,
> they could be 'heaped' and pointed to by a pointer in the fixed-length
> 'row' slot. The ratio of string index and string memory-structure
> overhead to the average size of the strings in an app might well be less
> than the ratio for integers and floats.
>
>
> So, I'm imagining a db, say for some small application that has 100
> tables, with average 10 attributes each, with average values about 10
> bytes long, and say 10,000 rows per table. That would be 10 million
> rows altogether and at 100 bytes for row values, about 1GB of values.
> This particular mess would have something like a 100 million
> intersection values and at say 5 bytes of optimum pointer and memory
> management overhead per value or double that, say 10 bytes on average,
> we'd need about 2GB of memory. Such a db would almost fit in the
> application memory of something like Windows 2000, but it seems a
> stretch to me.
>
>
> But what I've noticed over the years is that many useful db's are much
> smaller than this, in terms of actual data, especially some of the
> old-fashioned ones such as airline reservation systems, not to mention
> the typical video or grocery store or even many web apps such as
> password lookups. If I cut the above row count in half, I'm down to 1GB
> or so. If I cut my database down to 1 million rows across all tables,
> we're down to about 200 MB. Even if I add to this app some fairly big
> table with a million rows all by itself, I'm using about 400 MB. That's
> territory that many PC's have today. It's really the number of rows and
> attributes that matters for such a db, not the number of 'tables'. Try
> keeping 1 or 2 million rows in your Excel 'database'.
>
>
> I still haven't said anything about concurrency, eg. locking and the
> interleaving of updates but I think this can be dealt with by some
> unconventional thinking about table and db constraints, allowing such an
> engine to be single-threaded. As for persistence, such a db would read
> everything in some form at start-up and write a redo-log during
> operation. Its goal would be to keep the cpu 100% busy, given
> sufficient user activity, minus whatever time the redologging kept it
> idle. I think garbage-collection can mostly be built into the basic
> algorithms. Maybe this would hurt concurrency in an app that is mostly
> update which is a good argument for developing it on a slow cpu!
>
>
> I think the whole area of using memory in a different way is interesting
> and the idea of smaller applications doesn't bother me personally. In
> fact, I often wonder whether a bunch a small apps isn't a better way,
> project-management-wise, to get things done with fewer surprises.
>
>
> Anyway, I know I didn't answer the question, but that's what looking at
> TRM made me think. Don't know how far I'll get, but I think it's a
> niche that's worth a try.
>
> p
>
Received on Thu Nov 24 2005 - 02:29:06 CET

Original text of this message