Re: The TransRelational Model: Performance Concerns

From: Ken O'F <kenof2001_at_yahoo.com>
Date: 19 Jan 2005 18:27:41 -0800
Message-ID: <1106188061.054127.263480_at_f14g2000cwb.googlegroups.com>


I am a friend and colleague of Steve Tarin, the inventor of TRM. I would like to post the following response to Josh Hewitt's original critique of TRM, posted on 11 November 2004 ("The Transrelational Model: Performance Concerns"):

As someone who is familiar with the TransRelational Model (TRM), including unpublished aspects, I feel compelled to respond to Josh Hewitt's misguided analysis of TRM, posted on 11 November 2004. His analysis was based on reading descriptions of TRM in Appendix A of C. J. Date's "An Introduction to Database Systems", Eighth Edition, together with published patents. These documents describe only part of TRM. In particular, they provide a basic description of the TRM model without disclosing those aspects of TRM implementations designed to handle disk-based or update operations most efficiently (a distinction that C. J. Date makes most explicit in his Appendix A). Materials on these later topics currently exist only in manuscript form, most notably in C.J. Date's as yet unpublished book "Go Faster! The TransRelational Approach to DBMS Implementation ".

In database design questions, as in other matters, it is important to differentiate between a model and an implementation. TRM, as described by Dr. Date, is a model - in fact it is a subset of the full TRM model. As with any model, there are many possible implementations of TRM. Mr. Hewitt's imagined implementation is one of these - a very poor one, by his own analysis.

The main problem with Mr. Hewitt's analysis is that he extrapolates from the partial published descriptions, which address in-memory operations, to an (incorrectly) imagined disk-based version of the design, and to some equally erroneous assumptions about update operations. By incorrect extrapolation, he sets up his own straw man, and then knocks it down.

Furthermore, Mr. Hewitt's straw man ignores some key aspects of the TRM design that are discussed in the original patent. With a more careful reading of this (and of C. J. Date's Appendix A), Mr. Hewitt might have avoided a lot of wasted effort pursuing false assumptions about the design of TRM. His analysis seems to indicate a mindset where he first developed a set of assumptions about how TRM worked, and then tried to make the facts fit his assumptions.

C.J. Date's Appendix A is tutorial in nature. In order to get across some of the most basic concepts of TRM in a clear and concise manner, and to keep his discussion within bounds, Dr. Date ignores updates and secondary storage considerations. As Dr. Date explains, his Appendix A adopts the "fiction that the database is (a) read-only and (b) in main memory." But he goes on to say: "Please do not conclude that TR[M] is good for read-only, main-memory databases only, however - it is not." Unfortunately, this is precisely what Mr. Hewitt does.

Mr. Hewitt's entire analysis makes the false assumption that TRM uses the same algorithms on disk that it uses in memory. He ignores the possibility of inventions beyond those described in the published documents - i.e. further complementary ideas in the TRM design, to handle disk and update operations. This brings to mind the error made by Rolls Royce rocket engineers in the late 1940's, who proved to themselves conclusively that it was impossible to launch a craft into space using a rocket engine, and thus that spaceflight was impossible. Although their calculations were correct, their conclusion was obviously wrong. Their mistake was to not consider the possibility of a companion invention - in this case the idea of a multi-stage rocket, which neatly solved the problem.

Mr. Hewitt's analysis of TRM is analogous to the Rolls Royce analysis of a single-rocket spacecraft. He provides a detailed analysis showing that a main-memory static implementation of TRM would perform poorly in a large disk-based dynamic environment, and from this concludes: "In its current form...TRM is not a feasible candidate for implementing a relational DBMS". The problem here is that his assumed implementation of disk and update operations bears little resemblance to the way TRM is actually implemented on disk. And his conclusion is terribly wrong.

Mr. Hewitt's analysis is clearly handicapped by the fact that he has not had access to the unpublished aspects of the TRM model, nor to any implementation based on the full TRM model. But, rather than imagine an implementation, and then go on to assess it's performance and find it wanting, Mr. Hewitt should perhaps have asked himself whether there exists in real-life any implementation of TRM that has been performance-tested. From some of the statements in Appendix A (e.g. "Performance of TR[M] is orders of magnitude better than it is with a direct-image system."), one might reasonably assume that Dr. Date has seen an implementation of TRM, and that he has assessed its performance to vastly exceed that of conventional DBMS implementations. This is, in fact, the case. Not only has Dr. Date been exposed to a full TRM implementation (a prototype built by Required Technologies, with update and disk operations), but so have a variety of other database researchers and implementers, including several potential marquee customers who have run benchmarks on the TRM prototype using their own real data. The results have been most impressive, as in every benchmark case, without exception, blind tests have shown that TRM delivered many orders-of-magnitude performance improvements over existing RDBMSs in very large disk-based environments, over a broad variety of highly complex queries. These benchmarks can be demonstrated to anyone who is seriously interested in TRM, and will show that data does, in fact, stream from the disk at main-memory speeds, in a real-life environment.

In his imagined TRM implementation, Mr. Hewitt makes a series of design assumptions that lead inexorably to very poor performance. The Required Technologies implementation, which is based on the full TRM model, has a very different design than Mr. Hewitt's. While various business reasons have thus far caused this design to be maintained as a proprietary trade secret, the expectation is that this design will - hopefully in the relatively near future - be made available for public consumption. Moreover, several individuals (including both C.J. Date and myself) have already been exposed to the details of this design under the protection of a non-disclosure agreement. Without violating this agreement, I can disclose that TRM includes a variety of novel algorithms and data structures for handling disk-resident data, and that these typically result in database operations executing at or near main memory speeds, even when the overwhelming majority of the data is stored on disk.

Thus, it turns out that Mr. Hewitt's concern over issues such as disk splaying and binary searches on disk is unfounded. Interestingly, Dr. Date specifically referenced these matters in his Appendix A, where he writes: "In a disk-based system, will not the zigzagging mean a lot of random access and terrible performance? And can binary search be made to work efficiently on the disk?"..." Such questions are indeed serious ones, but (sadly) this is not the place to address them; suffice it to say that they can be and the solutions have been implemented." I can add, while bound by non-disclosure, that I have been exposed to methods by which TRM's disk-based implementation avoids performing costly disk-based binary searches and that TRM's design includes techniques that neatly solve this problem. Furthermore the TRM solution is far faster than B-trees, without the various negative characteristics that B-trees entail (as is verified by our benchmarks).

Note too that the full TRM design also includes novel approaches for handling update operations, and for maintaining good cache performance, thereby avoiding the insertion, deletion and cache coherency pitfalls that Mr. Hewitt imagines. While much of this area of TRM is still a trade secret, the original patent does, in fact, provide several specific methods relating to updates and cache performance. These already published aspects of TRM suffice to provide both rapid updateability and strong cache coherency. It is therefore quite curious that Mr. Hewitt never even considers these publicly disclosed techniques at all, despite the fact that they clearly provide superior update and cache performance, while avoiding all of the problems contained in the implementation imagined by Mr. Hewitt .

We would love to reveal more about TRM, but we cannot at the current time. The full TRM design is the intellectual property of Required Technologies, and, due to its extremely high value, can only be disclosed with adequate safeguards in place.

While Mr. Hewitt's analysis of TRM contains, at the very least, all the deficiencies discussed above, the greatest danger his analysis poses is that it leads one to the terribly erroneous conclusion that (in his words) "In its current form...TRM is not a feasible candidate for implementing a relational DBMS." Admittedly, Mr. Hewitt has only a partial description of TRM to go on. But his leap to such a conclusion is most unfortunate, because nothing could be further from the truth. In fact, one of the main reasons why relational advocates such as Dr. Date are so enthused about TRM is the exact opposite of this conclusion. To quote from Dr. Date's seminar description: "In a nutshell, the [TRM] allows us to build DBMSs that - at last! - truly deliver on the full promise of the relational model."

I'm also glad to report some potentially very good news here. Not only does the prototype implementation of TRM (referenced above) still exist, but also a full-blown commercial disk-based updatable RDBMS based on TRM (with standard SQL, ODBC, JDBC, and third-party tool interfaces) is nearly complete. But for certain business difficulties, this product would already exist. Even so, preliminary performance testing on this not-yet-complete product has been most impressive. Moreover, steps are currently being taken to resolve these business difficulties, thereby enabling the completion of this product - and hopefully the publication of C.J. Date's manuscript. I greatly hope that this comes about in the relatively near future. All will then be able to verify to their own satisfaction just how greatly TRM outperforms current-generation direct-image RDBMSs in all aspects, including disk-based and update operations. Ken O'Flaherty
19 January 2005 Received on Thu Jan 20 2005 - 03:27:41 CET

Original text of this message