Re: Indexing techniques for hierarchies
Date: 2 Jan 2002 22:37:26 -0800
Message-ID: <a6e74506.0201022237.5b8c4af0_at_posting.google.com>
> I am not familiar with XDb syntax but I am very familiar with C++, and
> it seems like some of your statements below are incorrect.
Let us prove it either way.
I have posted the database for the benchmark with 10 objects instead of 10,000 to reduce the download size and also because the general pattern in the remaining 9,990 objects is the same. See www.xdb1.com/Benchmark/7.asp
Some of the concept in XDb are non-existant in rdbs,
therefore it would be helpful to read
www.xdb1.com/Basics/ClassSimple.asp
In summary, each object is the instance of its class. Each object can
have instances. Thus an object is an instance and can be a class at
the same time.
For example, Person is the instance of Root. Person is also the class
for John, Mary and Doctor. Doctor is an instance of Person. Doctor is
also the class for Dr. Brown and Dr. Smith. And so on. Each object
always has a parent and may have a child. (Note: special rules apply
to root which has neither class or parent). In terms of a rdb, this
would be as if any record could also act as a table, unfortunately,
rdbs are not this orthogonal.
> You claim that each object was of a different class,
> but apparently you set up one class template
> and created 10000 object instances from it,
> each one linked to the one before,
Each object was of a different class and not the instance of one
object.
In a rdb, you know the "class" of a record by the table it is in.
In XDb, the class of an object is the object from which it was
instantiated.
In code, o.cls returns it class. To find an object's class via the db
interface, click on 'Cls' button. The focus moves to its class.
In the posted db above, in Root/MyData/Test, expand down to the 10th
'Test' object and click 'Cls' btn. It will move you to the 9th object.
Click 'Cls' btn again and it moves you to the 8th object, and so on.
If the class of each object was the same, clicking the Cls btn should
have moved you to the same object each time. It doesn't.
In the benchmark, the parent of each object is also the previous object. Although visually obvious, to verify this, expand down to the 10th object and click the 'Par' btn. It move to the prior object.
You mave have been confused because each object's class AND parent is the previous object. Why did I set it up this way? Only because the benchmark was very easy to create in this manner.
Cut the 10th object and Paste it under MyData Folder. With the 10th object selected, clicking 'Par' btn moves to its parent the MyData Folder. With the 10th object selected, clicking 'Cls' moves to its class the 9th object.
Another way to verify that there are 10 classes and not 10 instances of the same class is to switch to the class/instance hierarchy via the View Menu. By default you are looking at data in the parent/child hierarchy.
XDb is so orthogonal, that there is little difference in the way it handles cls/instance vs parent/child hierarchies.
In the original code, each object is the instance of the prior object and also the child of the prior object. In XDb, each object must have both. The Inst_add_ function requires the class and parent of the new object. During each loop, both are changed as pNew changes during each loop.
pNew = pTest;
for (int x=1; x < LOOPS; x++){
pNew = pO_m->Inst_add_(pNew, // cls
pNew); // parent}
To create that which you thought I did, make the small change shown below, where the class remains pTest during each loop.
for (int x=0; x < LOOPS; x++){
pNew = pO_m->Inst_add_(pTest, // cls
pNew); // parent}
In this case each instance would have the same class and would be equivalent to 10,000 records in one table.
> an impression reinforced by the fact
> that all data members for your objects were the same.
Because the 10,000 objects have the same properties does not mean each class is the same. If you have 10,000 tables and each table have the same fields, does that mean the single record in the different tables are actually in the 1st table???
The benchmark provided is simlar to creating 10,000 tables each with the same fields. Why did I do it this way? Because the code to create the classes is simpler. If you ask I will recreate the benchmark with a different field name in each class to absolutely positively erase this doubt, but it won't affect the benchmark any.
> This is just a degenerate case of an adjacency list.
No it isn't. I hope the above clears this up. If not, I can re-explain
and provide more examples.
> I would certainly not approach this problem in an RDBMS
> by creating 10000 different tables; that would be completely feebleminded.
> In my opinion a better benchmarking method to compare the performace
> of two DBMS would be to define some simple real-world tasks and let
> each "side" generate the most optimal data model and data manipulation
> code they can to satisfy the output requirements. Approaches such as
> yours ("make 10000 tables", etc.) just lead to square-peg, round-hole
> efforts to compare internal constructs when what is really needed is
> to compare tangible results independent of implementation methodology.
Because XDb is so fast, it required a large number of classes (aka tables) to get to 16 ms. A smaller number would make it look like 0 ms.
> For the record, I would stab at the problem in in SQL Server like so
> WHILE (_at_count < 10000)
> BEGIN
> INSERT INTO Adj (Parent_ID) VALUES (_at_parent_id)
> SELECT _at_parent_id = MAX(ID) FROM Adj
> SET _at_count = @count + 1
> END
>
> WHILE (_at_cur_id IS NOT NULL)
> BEGIN
> SELECT _at_parent_id = Parent_ID, @bool = Bool, @dub = Dub, @date =
> Date, _at_text = Text
> FROM Adj
> WHERE ID = _at_cur_id
>
> SET _at_cur_id = @parent_id
> END
>
> 420 ms on a Pentium II 400, SQL Server 7.0/NT 4.0, 256 MB RAM.
Luckily for rdbs, XDb is no faster when doing the benchmark either way because it is so orthogonal. XDb required 16 ms on a 500 Mhz Pentium III, NT 4.0, 256Mb.
This is approximatley a 20-to-1 ratio.
> Since you don't actually do anything with the intermediate traversal
> variables, I took the liberty of optimizing the SQL further to do what
> your benchmark code ultimately does -- namely, retrieve the root
> node's variables:
> SELECT _at_cur_id = MAX(ID) FROM Adj
> SET _at_date_start = GETDATE()
> SELECT _at_cur_id = ID, @bool = Bool, @dub = Dub, @date = Date, @text =
> Text FROM Adj WHERE ID = (SELECT MIN(ID) FROM Adj)
> SET _at_date_end = GETDATE()
> This code runs in between 0 and 10 ms on the same platform and matches
> your real-world final state.
Holy cow! The pNew points to the 10,000th objects at the end of the loop that creates them
pNew = pTest;
for (int x=1; x < LOOPS; x++){
pNew = pO_m->Inst_add_(pNew, // cls
pNew); // parent}
and then pNew is transfered to pX before the bottom loop which traverses them
pX = pNew;
while (pX != pTest){
// Get pX's props and put in local vars
...
// Move to parent
pX = pX->parent_();
}
and pX->parent moves up the parent/child hierarchy to the 1st Test Class. So you can see that "don't actually do anything with the intermediate traversal variables" is inaccurate and your optimization is not eqivalent.
What you are saying is that it took SQL Server between 0-10ms to find the minimum id in a single table of 10,000 records and then to access the fields of the record with the min id.
What I am saying is that XDb accessed all the fields of 10,000 objects each of a different class arrranged as a child of the previous in 16 ms. Received on Thu Jan 03 2002 - 07:37:26 CET