Re: S.O.D.A. database Query API - call for comments

From: Philip Lijnzaad <lijnzaad_at_ebi.ac.uk>
Date: 14 May 2001 14:42:00 +0100
Message-ID: <u73da8t5cn.fsf_at_sol6.ebi.ac.uk>


Carl> Now this is slowly becoming a full-time-job:

Carl> Since the industrial use of object databases compares to relational
Carl> databases in a ratio of worse than 1 to 100, I should eventually find
Carl> myself arguing against more than 100 people here. 

Your courage is indeed much admired :-)

Carl> We can argue from two sides here:
Carl> - What is ideally done to have a clean system?
Carl> - What is the fastest and quickest way in practice?

Carl> I know that practical performance techniques lead you to very horrible
Carl> constructs in relational databases. Sometimes primary keys contain lots of
Carl> information, even information about inheritance structures, to reduce the Carl> need for joins and indices.

they shouldn't.

Carl> "Existing links" are defined by the class schema of the application.
>>
>> which is a bit rigid ... you can't do tricks like finding the gaps in a
>> series of entries (see other thread) by doing a self join. One can argue
 Carl> that
>> this is hack, but in practice, such hack often save the day, in day-to-day
>> data management.

Carl> Again hack against theory.

why ? The theory is so general that it allows clever (and fast) 'hacks', because it is based on operating on sets of things rigth from start. I believe this is what makes it so suitable to large-scale data management.

Carl> I have said this before earlier in this thread: If you need a rigid
Carl> system for day-to-day data management to maintain rather flat
Carl> inheritance hierarchies and lots of simple records: Relational
Carl> databases are probably the best choice.

I object to the 'rigid' and 'flat'; the relational systems I have worked with are anything but rigid or flat.

Carl> I don't disagree with your "every-day-best-hack-policies" at all. I have
Carl> used them in the past myself but lots of code became less and less
Carl> maintainable.

That can be a problem, but the alternative is to keep migrating an object system to the changing needs, or to write cumbersome loops that are had for free using a RDBMS.

Carl> - Relational databases need multiple tables to represent inheritance
Carl> hierarchies. Queries tend to get lots and lots of joins. The access
Carl> pattern
Carl> becomes very difficult to understand.

>>
>> you don't always _have_ to build complex joins if you don't want to; you
 Carl> have
>> the _possibility_, which brings great extra speed (far fewer round-trips)
>> and extra power (sophisticated queries and aggregates functions).

Carl> This is another "best-hack" against "clean-object-model" argument.

Why ? Using relational calculus, you 'massage' the relevant (parts of) entities into a set of data that is of interest to the business logic, in the most optimal way. The massaged (parts of) entities need not correspond to any proper object type known to the system, but that is irrelevant. The end result of a query or transaction will be something that is then expressed (or expressible) as objects. Relations expose every nook and cranny of the data, in the most general way; sometimes this is simply needed.

Carl> - With object databases you can query against a single object class,
Carl> no
Carl> matter how deep the inheritance hierarchy is.

>>
>> So you can with RDBMSs, at the expense of expressiveness.

Carl> I don't understand how.

Carl> How many tables would you use to represent an inheritance hierarchy of the
Carl> following?
Carl> - person
Carl> - employee
Carl> - manager

Depends, but usually three. This can be 'horizontal' (only tables for leaf-classes; query super-classes through a UNION of a restriction of them) or vertical (each class its own table, with sub-classes having just the bits that are 'new', + a foreign key that is also a primary key; query sub-classes through a join of super- and sub-table).

Carl> Using two paradigms makes an application more complex.

>> But this presupposes that you can treat the data and the 'business logic'
>> as one unit, which is highly debatable. It does make sense to decouple
>> storage and business logic. Application object attempt to roll the data
>> and the logic into one, which is useful for the application programmers,
>> but you loose a substantial amount of data manageability (malleability) by
>> insisting that data adhere to the objects.

Carl> Relational people always come up with this argument, questioning if Carl> coupling of data and logic makes sense. I have never understood it.

Carl> Object classes provide two perfectly separatable types of information:
Carl> - members define the data structure
Carl> - methods define behaviour

Carl> Where is the problem of separating members and methods, should you ever
Carl> wish to do so? You simply delete all methods and reuse the data Carl> structure for another task.

That's not the point: the tables and primary/foreign keys guarantee you that you can query the most bizarre combinations of parts of entities, producing things that don't have to be any particular object. This generality is lacking in OODBMSs. So the 'separation' aspect is that RDMBSs do one thing (storing and querying), and do it well (i.e. in a very general way).

>> arguably, because there is code to maintain. But several writers have
>> attested the nightmare (or impossibility) of schema evolution using an
>> OODBMS, so the trade apears to be ease of application development
>> vs. adaptability of schema
>>
>> or put negatively,
>>
>> rigidity of schema vs. having to maintain a mapping

Carl> Maintainability is a matter of the implementation. Schema evolution is Carl> not a problem inherent to object datases in principle.

I'm not so sure, because objects don't have the closure property. I.e., products, projection, set operations and restrictions of relations are again relations. This symmetry is not available for objects.

Carl> - more implementation work = higher cost
>>
>> no: you keep the data management more isolated (orthogonal) from the
>> application logic, which makes the data more usuable in the future.

Carl> No: Reuse your object structures with a new logic if you wish.

So what happens if a particular relationship changes from a one-to-many relationship into a many-to-many relationship ? You will have to update your object structure, not matter what (both in the case of an OODBMS as well as in the RDBMS + OR layer).

Carl> For the future we want a query language with the same power (the original Carl> point of this thread, if you remember).

OK, we seem to have drifted. I have to say the original OQL did look very promising (simplyfying many relational queries as well).

Carl> I don't understand your
Carl> round-trip-problem. This is dependant on the implementation.

in theory yes (if you are clever enough to do query transformations), but in practice no. In practice, complicated queries end up being done by explicit pointer chasing, because that's the only thing available. Apart from the inability to do convenient aggregations over the data, you loose the ability to work with sets of things ('things' not necessarily being objects). Compared to that, the round-trip problem is indeed minor.

>> No, of course not: you have one object layer, and all the application code
>> uses just that. You only have to adjust the object-relational layer.

Carl> Agreed, if an OR layer is used.
Carl> OR layers produce lots of memory and performance overhead.

no, not necessarily. In fact, frequently the OR layer is used to add some caching, improving the speed relative to the RDBMS. I will conceed that an OODBMS (which usually also does sophisticated chaching, it seems) has an advantage here.

>> And BTW, much of this can be minimized by having the object-relational
 Carl> layer
>> access the RDBMS mostly thru relational views. Views offer a device to
 Carl> avoid
>> changes in the schema (like (de)normalization, or schema evolution)
 Carl> affecting
>> application code; they can buy you precious convenience and time, which is
>> simply unavailable in an OODBMS system.

Carl> As far as I am informed, views still are very slow.

Where did you get this information from ? A view is as fast as the query that defines it. So stating that a view is slower than a table may be true, but you're comparing apples and oranges.

Carl> Relational databases have only just started to allow indices on views.

Views can (and do) use the same indexes that the tables they consist of have, so I don't know what you're trying to say here.

Or do you mean updateable join views ? They have indeed only recently (Oracle 7.5.4 I believe, ... came out 1995 or so) become available, and not universally.

Or do you mean indexes on functions of column values (e.g., indexing case-insenstively)? They are in fact trivial to implement, but not very widely available.

Cheers,

                                                                      Philip
-- 
If you have a procedure with 10 parameters, you probably missed some. (Kraulis)
-----------------------------------------------------------------------------
Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-08
+44 (0)1223 49 4639                 / Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           \ Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53
Received on Mon May 14 2001 - 15:42:00 CEST

Original text of this message