Re: S.O.D.A. database Query API - call for comments

From: Carl Rosenberger <carl_at_db4o.com>
Date: Mon, 14 May 2001 17:40:24 +0200
Message-ID: <9doua1$ci5$01$1_at_news.t-online.com>


Philip Lijnzaad wrote:
> Your courage is indeed much admired :-)

Thanks!
My compliments in return for being a fair and enjoyable discussion partner.

> Carl> "Existing links" are defined by the class schema of the application.
> >>
> >> which is a bit rigid ... you can't do tricks like finding the gaps in a
> >> series of entries (see other thread) by doing a self join.
>
> Carl> Again hack against theory.
>
> why ? The theory is so general that it allows clever (and fast) 'hacks',
> because it is based on operating on sets of things rigth from start. I
> believe this is what makes it so suitable to large-scale data management.

The approaches definitely come from two totally different directions: - SQL *is* the language of sets and mass updates - Object orientation is the exact opposite. You define single object behaviour.

The two extremes both need the functionality of the other side: - Relational databases are headed towards "object-relational". Inheritance for tables is introduced. Akmal Chaudhri has recently posted an example on how Informix handles this to comp.databases.object. A side-note: Serializing objects to columns has nothing to do with "object-relational". It merely is "object-struct-reuse".
- Object databases also need declarative functionality for queries and updates.

Let's hope we can all learn from eachother to create the best of both worlds.

> Carl> I have said this before earlier in this thread: If you need a rigid
> Carl> system for day-to-day data management to maintain rather flat
> Carl> inheritance hierarchies and lots of simple records: Relational
> Carl> databases are probably the best choice.
>
> I object to the 'rigid' and 'flat'; the relational systems I have worked
 with
> are anything but rigid or flat.

Give a finger and you've lost a hand. :-) I meant "rigid" in a positive sense, as in "stable-in-every-day-use". Sorry if I choose the wrong words sometimes, since English is not my mother language. Replace "rigid" with "stable".

Forget "flat" also, if you wish. "Flat" and "heterogeneous" are a matter of standpoint.

> Why ? Using relational calculus, you 'massage' the relevant (parts of)
> entities into a set of data that is of interest to the business logic, in
 the
> most optimal way. The massaged (parts of) entities need not correspond to
 any
> proper object type known to the system, but that is irrelevant.

This is where I see one of the largest advantages of relational databases: You can always set up a two-dimensional table of your result, which is instantaniously readable by a human.

> The end
> result of a query or transaction will be something that is then expressed
 (or
> expressible) as objects. Relations expose every nook and cranny of the
 data,
> in the most general way; sometimes this is simply needed.

This is where I see one of the largest disadvantages of relational databases:
Reconstructing objects and the relation between them is not supported by the database engine. Once you start using a mapping tool to help, you build an *additional* in-memory-database of the objects that you have "checked-out" to the user-application. Ideally the additional object layer would also take care about locking issues. This is where object databases theoretically have great system-immanent advantages since they need no additional layer.

> Carl> How many tables would you use to represent an inheritance hierarchy
 of the
> Carl> following?
> Carl> - person
> Carl> - employee
> Carl> - manager
>
> Depends, but usually three. This can be 'horizontal' (only tables for
> leaf-classes; query super-classes through a UNION of a restriction of
 them)
> or vertical (each class its own table, with sub-classes having just the
 bits
> that are 'new', + a foreign key that is also a primary key; query
 sub-classes
> through a join of super- and sub-table).

This is a fair and honest answer. "Depends" shows that you have a problem on a decision, depending on application and performance issues. The need to take a decision already is the sign of a mismatch.

Thinking OO, the above is simple and straigth forward: manager extends employee extends person.

Using your second suggestion, the necessary queries for manager names looks like this:
- Relational:
select person.name from person, employee, manager   where employee.person = person.id
  and manager.employee = employee.id
- Object-oriented:
database.get(manager);

Now would you be so kind to write a query that completely puts all person objects back together again, with the following addition to the datamodel:

- customer extends person
- sponsor extends person
- VC extends sponsor
- self_employed extends employee

I bet, I will be faster:
database.get(person)

> That's not the point: the tables and primary/foreign keys guarantee you
 that
> you can query the most bizarre combinations of parts of entities,
 producing
> things that don't have to be any particular object. This generality is
> lacking in OODBMSs. So the 'separation' aspect is that RDMBSs do one thing
> (storing and querying), and do it well (i.e. in a very general way).

I fully agree.

Accordingly we would like to extend the query functionality for object databases.

> Carl> Maintainability is a matter of the implementation. Schema evolution
 is
> Carl> not a problem inherent to object datases in principle.
>
> I'm not so sure, because objects don't have the closure property. I.e.,
> products, projection, set operations and restrictions of relations are
 again
> relations. This symmetry is not available for objects.

To some extent, you may be right:
A table may exist on it's own. You can transfer it anywhere independantly. If you transfer a second table, the joins will be there again, simply because they only consist of numbers.

Object databases would transfer objects independantly. A complete restore of a large object network is only possible with access to OIDs. Ideally these would be unique worldwide (GUIDs).

Now what is the usecase?
Replication!

This is where GUIDs also become worthwhile using on relational systems.

> Carl> No: Reuse your object structures with a new logic if you wish.
>
> So what happens if a particular relationship changes from a one-to-many
> relationship into a many-to-many relationship ? You will have to update
 your
> object structure, not matter what (both in the case of an OODBMS as well
 as
> in the RDBMS + OR layer).

True.
OR layers can save you very much work. It might get your implementation work very close to using an object database.

I am just questioning the efficiency, performance, stability and resource consumption of an additional layer.

> Carl> I don't understand your
> Carl> round-trip-problem. This is dependant on the implementation.
>
> in theory yes (if you are clever enough to do query transformations), but
 in
> practice no. In practice, complicated queries end up being done by
 explicit
> pointer chasing, because that's the only thing available.

Query transformation and optimization is quite simple:

- think in sets
- identify reocurring sets
- evaluate one set at a time
- always use the smallest set first
- narrow larger sets by smaller sets

The principle will be the same for relational and for object databases.

Of course query evaluation can get more complex if object databases provide added language binding functionality like support for Arrays or Collections

> Apart from the
> inability to do convenient aggregations over the data, you loose the
 ability
> to work with sets of things ('things' not necessarily being
> objects). Compared to that, the round-trip problem is indeed minor.

"Sets of things" should be available on object databases by providing indices.

> Carl> Agreed, if an OR layer is used.
> Carl> OR layers produce lots of memory and performance overhead.
>
> no, not necessarily. In fact, frequently the OR layer is used to add some
> caching, improving the speed relative to the RDBMS. I will conceed that an
> OODBMS (which usually also does sophisticated chaching, it seems) has an
> advantage here.

Thanks!

Yes, OR layers can improve the speed by caching objects, but this only works for usecases, where a small and defined amounts of data are accessed. For typical High-End-Usecases you can assume that caches are always full.

I will conceed that most current object databases including our own perform badly, if query-evaluation hits cache limits. The future will see further development and better systems.

> Carl> As far as I am informed, views still are very slow.
>
> Where did you get this information from ?

  • Empirical research, two years ago.
  • Continous contact to my former company.

> A view is as fast as the query that
> defines it. So stating that a view is slower than a table may be true, but
> you're comparing apples and oranges.

I fully agree.
A view is *only* as fast as the query that defines it.

Sorry for my bad wording. The negative performance hit occurs when queries join views with other views or tables. Most (all ?) current query optimizers build all views first (run all underlying queries), before evaluating them against eachother.

> Carl> Relational databases have only just started to allow indices on
 views.
>
> Views can (and do) use the same indexes that the tables they consist of
> have, so I don't know what you're trying to say here.

A little more than a year ago MSSQL has added indexed views. The desired effect: Views can run faster than the queries that define them, because joins are pre-evaluated.

Kind regards,
Carl

---
Carl Rosenberger
db4o - database for objects - http://www.db4o.com
Received on Mon May 14 2001 - 17:40:24 CEST

Original text of this message