Re: Indexing techniques for hierarchies

From: Matt Wigdahl <>
Date: 3 Jan 2002 07:20:08 -0800
Message-ID: <> (James) wrote in message news:<>...

After having reviewed a bit more of your site, I regret even making the attempt to compare SQL Server to your XDb product. Comparing performance is invalid for several reasons, the primary one being that XDb is solely RAM-based and apparently doesn't interact with the disk at all. It seems obvious to me that a single-user, RAM-only C++ library will sometimes be able to outperform a general-purpose, commercial multi-user RDBMS with transaction control, concurrency control and recovery features, particularly on benchmarks custom-built to play to XDb's strengths, such as iterating over a linked list.

> In summary, each object is the instance of its class. Each object can
> have instances. Thus an object is an instance and can be a class at
> the same time.

I follow you so far.

> For example, Person is the instance of Root. Person is also the class
> for John, Mary and Doctor. Doctor is an instance of Person. Doctor is
> also the class for Dr. Brown and Dr. Smith. And so on. Each object
> always has a parent and may have a child. (Note: special rules apply
> to root which has neither class or parent). In terms of a rdb, this
> would be as if any record could also act as a table, unfortunately,
> rdbs are not this orthogonal.

I disagree with the "unfortunately" part, but I'm still following you...

> > You claim that each object was of a different class,
> > but apparently you set up one class template
> > and created 10000 object instances from it,
> > each one linked to the one before,
> Each object was of a different class and not the instance of one
> object.
> In a rdb, you know the "class" of a record by the table it is in.

Not necessarily. Only if you insist on the naive mapping of table=class, row=object without doing any actual domain analysis and data modeling.

> In XDb, the class of an object is the object from which it was
> instantiated.
> You mave have been confused because each object's class AND parent is
> the previous object. Why did I set it up this way? Only because the
> benchmark was very easy to create in this manner.

I was not confused. I merely made appropriate optimizations based on the code you provided. Although you claim your "classes" were all different, they all had the exact same characteristics, to wit:

  • all of them had an object id
  • all of them had a boolean parameter
  • all of them had a double parameter
  • all of them had a date parameter
  • all of them had a character parameter
  • all of them had a relationship to a parent object

Since all of these characteristics were the same, I made the optimization of storing all these "different" "classes" as rows in a single table, modeled as an adjacency list. To duplicate what you did in your benchmark, there was no need to invent 10000 identical tables.  That would have been stupid.

> > an impression reinforced by the fact
> > that all data members for your objects were the same.
> Because the 10,000 objects have the same properties
> does not mean each class is the same.

So what are the differences? Name? Not in this case. ID value? Yes. Well, I just made ID value a column in my single table rather than making a separate table for each ID value. Even if the names were different I'd just add another column to my single table. Other than ID and perhaps name (if you changed your benchmark code) there is _no_ substantive difference between classes.

> If you have 10,000 tables and
> each table have the same fields, does that mean the single record in
> the different tables are actually in the 1st table???

Your question is moot. There is no need for 10000 tables to do what your benchmark did.

> The benchmark provided is simlar to creating 10,000 tables each with
> the same fields. Why did I do it this way? Because the code to create
> the classes is simpler. If you ask I will recreate the benchmark with
> a different field name in each class to absolutely positively erase
> this doubt, but it won't affect the benchmark any.

Why not? Will it not affect the benchmark because you are effectively just getting "first boolean parameter", and "first double parameter", etc from your objects? If so, then the "field name" is nothing but a display convenience and can be factored out. I would add four label columns to my one table and be done with it.

> > This is just a degenerate case of an adjacency list.
> No it isn't. I hope the above clears this up. If not, I can re-explain
> and provide more examples.

You have not convinced me that what you did cannot be represented as an adjacency list.

> > I would certainly not approach this problem in an RDBMS
> > by creating 10000 different tables; that would be completely feebleminded.
> > In my opinion a better benchmarking method to compare the performace
> > of two DBMS would be to define some simple real-world tasks and let
> > each "side" generate the most optimal data model and data manipulation
> > code they can to satisfy the output requirements. Approaches such as
> > yours ("make 10000 tables", etc.) just lead to square-peg, round-hole
> > efforts to compare internal constructs when what is really needed is
> > to compare tangible results independent of implementation methodology.
> Because XDb is so fast, it required a large number of classes (aka
> tables) to get to 16 ms. A smaller number would make it look like 0
> ms.

All you are doing is iterating through a linked list in memory, with a few extra indirections to give you access to object variables without needing a hardcoded name. Of course it's fast.

> The benchmark you provided in SQL Server created 10,000 records in a
> single table, each the child of the previous. This is not an
> equivalent benchmark which really truely requires 10,000 tables each
> with the same fields and one record in each table and each record
> being the child of the record in the previous table.

No, this would be stupid. I've addressed this sufficiently above. My design is equivalent to yours. A linked list of 10000 instances whose members are all the same (even if the member _names_ are different) can be easily represented in a single relational table and preserve all functionality that you demonstrated in your benchmark.

> Luckily for rdbs, XDb is no faster when doing the benchmark either way
> because it is so orthogonal.

No, it's because this benchmark is so degenerate that it doesn't matter. It maps the same either way.

> XDb required 16 ms on a 500 Mhz Pentium
> III, NT 4.0, 256Mb.
> This is approximatley a 20-to-1 ratio.

Apples to oranges, since you have no disk storage, transactional isolation, locking/concurrency control, or multiuser access. Had I realized that you were just iterating over a linked list in memory ahead of time I wouldn't have bothered. BTW, my processor was a Pentium _II_, not _III_ as yours was. There is a significant performance difference over and above the MHz difference.

> A benchmark equivalent to the original using separate tables in SQL
> Server would yield a much higher ratio. I have a hard time convincing
> people that XDb resolves an object's relationship to it parent or
> class at a rate of 100,000,000 per second on a 800Mhz PC.

You are just traversing a linked list in memory. It doesn't surprise me that you can attain such speeds traversing a linked list in memory. Received on Thu Jan 03 2002 - 16:20:08 CET

Original text of this message