Re: S.O.D.A. database Query API - call for comments

From: Carl Rosenberger <carl_at_db4o.com>
Date: Mon, 14 May 2001 14:19:35 +0200
Message-ID: <9doii5$n73$04$1_at_news.t-online.com>


Now this is slowly becoming a full-time-job:

Since the industrial use of object databases compares to relational databases in a ratio of worse than 1 to 100, I should eventually find myself arguing against more than 100 people here. Is this some kind of conspiracy to keep me from doing development work? :-)

! Help me please, all object gurus reading along !

Philip Lijnzaad wrote:
> Carl> From what I have learned about relational systems, choosing a
 primary
> Carl> key that does model something in the real world, is bad practice,
 since
> Carl> you run into terrible trouble, if the used system changes in the
 real
> Carl> world (e.g. changed employee id pattern).
>
> true, but public identifiers that behave more or less like primary keys
 are a
> fact of life (even though in most cases you shouldn't use them as real
 keys
> in a database).

We can argue from two sides here:
- What is ideally done to have a clean system? - What is the fastest and quickest way in practice?

I know that practical performance techniques lead you to very horrible constructs in relational databases. Sometimes primary keys contain lots of information, even information about inheritance structures, to reduce the need for joins and indices.

> Carl> "Existing links" are defined by the class schema of the application.
>
> which is a bit rigid ... you can't do tricks like finding the gaps in a
> series of entries (see other thread) by doing a self join. One can argue
 that
> this is hack, but in practice, such hack often save the day, in day-to-day
> data management.

Again hack against theory.

I have said this before earlier in this thread: If you need a rigid system for day-to-day data management to maintain rather flat inheritance hierarchies and lots of simple records: Relational databases are probably the best choice.

I don't disagree with your "every-day-best-hack-policies" at all. I have used them in the past myself but lots of code became less and less maintainable.

> Carl> You are trying to use the deficiencies of relational databases as an
> Carl> argument for them:
> Carl> - Relational databases need multiple tables to represent inheritance
> Carl> hierarchies. Queries tend to get lots and lots of joins. The access
 pattern
> Carl> becomes very difficult to understand.
>
> you don't always _have_ to build complex joins if you don't want to; you
 have
> the _possibility_, which brings great extra speed (far fewer round-trips)
> and extra power (sophisticated queries and aggregates functions).

This is another "best-hack" against "clean-object-model" argument.

With a fast and functional object database, you just don't worry: Remodel reality with your objects.
An added inheritance hierarchy has no negative impact on performance. It rather improves queries because it adds an additional class category.

> Carl> - With object databases you can query against a single object class,
 no
> Carl> matter how deep the inheritance hierarchy is.
>
> So you can with RDBMSs, at the expense of expressiveness.

I don't understand how.
How many tables would you use to represent an inheritance hierarchy of the following?

- person
- employee
- manager


> Carl> Using two paradigms makes an application more complex.
>
> But this presupposes that you can treat the data and the 'business logic'
 as
> one unit, which is highly debatable. It does make sense to decouple
 storage
> and business logic. Application object attempt to roll the data and the
 logic
> into one, which is useful for the application programmers, but you loose a
> substantial amount of data manageability (malleability) by insisting that
> data adhere to the objects.

Relational people always come up with this argument, questioning if coupling of data and logic makes sense. I have never understood it.

Object classes provide two perfectly separatable types of information: - members define the data structure
- methods define behaviour

Where is the problem of separating members and methods, should you ever wish to do so? You simply delete all methods and reuse the data structure for another task.

> arguably, because there is code to maintain. But several writers have
> attested the nightmare (or impossibility) of schema evolution using an
> OODBMS, so the trade apears to be
> ease of application development vs. adaptability of schema
>
> or put negatively,
>
> rigidity of schema vs. having to maintain a mapping

Maintainability is a matter of the implementation. Schema evolution is not a problem inherent to object datases in principle. As I have written before, we store a superset of all used object schemas.

I have learned the need for declarative update statements out of this discussion.

> Carl> - more implementation work = higher cost
>
> no: you keep the data management more isolated (orthogonal) from the
> application logic, which makes the data more usuable in the future.

No: Reuse your object structures with a new logic if you wish.

> Carl> - worse performance
>
> for a very limited class of queries, yes, perhaps. For all others, RDBMSs
> seem to perform nicely, or much faster (through the lack of round trips),
 and
> offering much greater power. And just to repeat my self, for properly
 indexed
> tables, the speed of a join is O(R + S), R the number of resulting rows, S
> the number of rows in the smallest table. It does not depend on the number
 of
> tables or total numbers of rows.

If we talk about RDBMSs vs ODBMSs as they exist today, I have no arguments against you.

For the future we want a query language with the same power (the original point of this thread, if you remember). I don't understand your round-trip-problem. This is dependant on the implementation.

The ultimate speed comparison is a matter of the implementation of indices. There should not be a great difference between systems here.

Object databases do handle index generation much more nicely, since they "understand" what objects are. Accordingly there is no need for network-roundtrips for key generation to "join" table records.

> Carl> ...and of course more misplaced data.
>
> ?

Sorry, I could have also used questionmarks to comment Lee's statements. I might do that in the future. "Misplaced data" was just a hit back with the same weapon.

> Carl> You have to extend two models:
> Carl> - the application class scheme
> Carl> - the database table system
>
> Carl> You also have to correct all queries and mappings
>
> yes; that's inevitable (although large parts can be automated). How are
 you
> going to do this in an OODBMS ?

Since we only use objects, the compiler automatically detects problems.

In the future there will be Source-Code-In-Database (SCID) development environments.

If you want to change the name of a member of an object: You only need one central change. Everything else will be taken care of automatically.

If you want to change an inheritance hierarchy: Again you need one central change. A wizard GUI will help you define how your object data is to be morphed.

If you want to change the datatype of an object member (long to int or double):
Just do it. Everything else will be taken care of automatically.

The higher the integration will be, the better all this will work. Eventually relational databases will fall of the cliff.

> Carl> by working through all applications by hand.
>
> No, of course not: you have one object layer, and all the application code
> uses just that. You only have to adjust the object-relational layer.

Agreed, if an OR layer is used.

OR layers produce lots of memory and performance overhead.

> And BTW, much of this can be minimized by having the object-relational
 layer
> access the RDBMS mostly thru relational views. Views offer a device to
 avoid
> changes in the schema (like (de)normalization, or schema evolution)
 affecting
> application code; they can buy you precious convenience and time, which is
> simply unavailable in an OODBMS system.

As far as I am informed, views still are very slow. Relational databases have only just started to allow indices on views. Every index of course costs time on inserts and updates.

Kind regards,
Carl

---
Carl Rosenberger
db4o - database for objects - http://www.db4o.com
Received on Mon May 14 2001 - 14:19:35 CEST

Original text of this message