Object databases beat joins (was: Re: ODMG Website?)

From: Carl Rosenberger <carl_at_db4o.com>
Date: Thu, 15 May 2003 23:18:16 +0200
Message-ID: <ba0vh0$7as$03$1_at_news.t-online.com>


andrewst wrote:
> > This is what I think is the fundamental problem with SQL (at least the
> > standard) is that it only supports scoping by type or table.
> > Since it only
> > supports relationships through joins it doesn't scale well as
> > the joins
> > become complex and the tables get very large. SQL expects me
> > to find my
> > children by looking among all the children in the world, rather
> > than looking
> > in my own house.
>
> You are having a laugh, are you not? Never heard of indexes?
> I may as well say "an ODBMS can't find a named child unless I know what
> house she lives in". Probably equally naive.

[...]

> Joins work just fine.

Paul and his crew provide the engine for some of the most complex huge databases. Your "Joins work just fine" statement may not be adequate for that scale. In huge databases, you never have enough performance.

A very simple example:
Imagine you have a database with 2 billion of hydrogen atoms, joined together to one billion of hydrogen molecules. If you have one atom and want to find the corresponding one, your index solution will have to walk the complete tree consisting of 2 billion of index entries. They may not fit into RAM, I am afraid. Alright then, if you have a perfectly balanced tree on your hard disk, you will need at least 31 (2 ^ 31 = 2147483648) reads to your disk and 31 compare operations.

An object database typically references objects directly. Access time is constant:
Walk one pointer!
...no matter if you have 10 atoms, 2 billion, or 42 fantastillions. You can get away with one singel read operation. That's the best performance you can get.

In huge databases, index construction and maintenance can also become a performance problem.

If you access objects on the intended reference path (and yes, there are applications that do that) you may get along completely without indices.

Joins have a problem with null.

Joins do not have a viewpoint on the data so the optimzer has to guess, where to start and what to do next.

If you have a deep class hierarchy, your queries can become very very complex.

Imagine an inheritance hierarchy of 10 classes. How many joins will you use?

With object databases, the query will remain dead-simple: Choose whatever extent from the hierarchy that you wish and query for it.

Just a sample:

Class1
Class2
...
...
Class9
Class10

If you want all data contained in a class 9 object:

Object database pseudocode:
database.get(class9.class);

Relational pseudocode:
SELECT FROM Class1, Class2, ... , Class9, Class10 WHERE Class1.pkey = Class2.fkey
AND Class2.pkey = Class3.fkey
...
...
AND Class8.pkey = Class9.fkey
AND Class9.pkey = Class10.fkey

Guess what's more work to write, more work to maintain. Guess what will run faster.

> It's the object "data model" that's a cripple.

I wouldn't call it a "cripple" but I fully agree that it's not perfect.

It does work quite good for application development.

Type-safety (do you have that in relations ?) and modern IDE tool support (anything equivalent to refactor relational models as quickly as you can refactor classes with Eclipse?) make development very very efficient.

The commonly used object model may not be theoretically correct but it proves to get the job done. In modern IT, minimizing development time and minimizing development cost are more important than theoretically more correct data models. It's all about becoming profitable. Fast!

If you question the principle benefits of object-oriented development, you may be the one that's having a laugh.

X-post and Fup to comp.databases.theory. I feel like a whipping today.

Kind regards,
Carl

--
Carl Rosenberger
db4o - database for objects - http://www.db4o.com
Received on Thu May 15 2003 - 23:18:16 CEST

Original text of this message