Re: Object databases beat joins (was: Re: ODMG Website?)

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Sun, 18 May 2003 00:53:35 GMT
Message-ID: <3EC6D9F7.7010204_at_earthlink.net>


Peter Koch Larsen wrote:

> "Carl Rosenberger" <carl_at_db4o.com> wrote in message news:<ba5sji$vih$00$1_at_news.t-online.com>...
> 

>>Mikito Harakiri wrote:
>>
>>>>Imagine you have a database with 2 billion of hydrogen atoms, joined
>>>>together to one billion of hydrogen molecules. If you have one atom
>>>>and want to find the corresponding one, your index solution will have
>>>>to walk the complete tree consisting of 2 billion of index entries.
>>>>They may not fit into RAM, I am afraid. Alright then, if you have
>>>>a perfectly balanced tree on your hard disk, you will need at least
>>>>31 (2 ^ 31 = 2147483648) reads to your disk and 31 compare operations.
>>>>An object database typically references objects directly. Access time
>>>>is constant:
>>>>Walk one pointer!
>>>>...no matter if you have 10 atoms, 2 billion, or 42 fantastillions.
>>>>You can get away with one singel read operation.
>>>>That's the best performance you can get.
>>>>
>>>>In huge databases, index construction and maintenance can also become
>>>>a performance problem.
>>>
>>>Wrong arithmetics. A typical B-tree having 1G nodes will be 4 levels deep. 3
>>>levels would typically be cached. Therefore, we have 1 random IO, the same
>>>as in the OODB case where the foreigh key reference is expressed explicitly
>>>by a pointer.
>>
>>How many comparison operations will you have on average?
> 
> When storing 2 billion (2000000000) elements, to find a particular
> element in some ordered structure - such as a B-tree - you will need
> about 31 comparisons.

It's a pity to see numbers being waved around with no explanation of why those numbers are justified. I started this response because I was somewhat sceptical of the numbers being quoted.

Let's do some calculation with explicitly stated assumptions, so people can dissect the answer or alter the assumptions to suit their experience.

Index page size = 2048		(probably too conservative - small)
Size of pointer to next node = 4 bytes	(probably an under-estimate)
Size of key for data = 4 bytes		(that is only just plausible)
Index page overhead = 32

Hence, in the root page, you can place up to (2048 - 32) / (4+4) = 252 entries. In the second level of nodes, therefore, you can have 252**2 nodes covered, and the third level 252**3 and in the fourth level 252**4, which is 4,032,758,016, which is large enough to cover the 2E9 nodes hypothesized. However, if you change the pointer size from 4 to 8 bytes and the key size from 4 to 8 bytes, then you can only fit 126 entries per page, and you need to go to five levels in the B-tree. Increase the key size bigger than 48 bytes (36 entries per page) and you need to go to 6 levels of B-Tree to handle 2E9 leaf nodes (36**6 > 2E9, 35^6 < 2E9, and 36 entries on a page means each entry is 56 bytes long, or 48 bytes of key and 8 bytes of pointer). Changing the page size to 4096 doesn't alter the conclusions about levels very much; you'd need 4 levels with a key size of 16 bytes (or, indeed, with an entry size of 8 bytes - 4 bytes key, 4 bytes pointer). Reducing the page size to 1024, and you probably need 5 levels of B-Tree even with 4 byte pointers and 4 byte keys (and with 16 byte keys, it would be 6 levels - and this was more like a scenario I'd had to deal with previously).

How about searching? That depends. On each level, you know the range of values covered and can do a binary search across the list of nodes to determine which entry is applicable. So, with 4 levels and 128-255 entries per level, you need up to 8 comparisons per level (binary search), for 4 * 8 = 32 comparisons in the worst case; with a 5-level B-Tree, maybe 40. With the 6 level B-Tree because of 48 byte keys, you'd have 6 levels * ceil(log2(56)) = 6 * 6 = 36 comparisons worst case.

So, my initial scepticism is somewhat unwarranted - the calculations suggest that for plausible configurations, the B-Tree will either be 4 or 5 levels deep, and the number of comparisons might be as large as 40 but could be around 31-32.

Maybe all this was a waste of time on my part - but it would have been nice to see some justification for the values derived.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Received on Sun May 18 2003 - 02:53:35 CEST

Original text of this message