Re: S.O.D.A. database Query API - call for comments
Date: 14 May 2001 14:42:00 +0100
Message-ID: <u73da8t5cn.fsf_at_sol6.ebi.ac.uk>
Carl> Now this is slowly becoming a full-time-job:
Carl> Since the industrial use of object databases compares to relational Carl> databases in a ratio of worse than 1 to 100, I should eventually find Carl> myself arguing against more than 100 people here.
Your courage is indeed much admired :-)
Carl> We can argue from two sides here: Carl> - What is ideally done to have a clean system? Carl> - What is the fastest and quickest way in practice? Carl> I know that practical performance techniques lead you to very horrible Carl> constructs in relational databases. Sometimes primary keys contain lots ofCarl> information, even information about inheritance structures, to reduce the Carl> need for joins and indices.
they shouldn't.
Carl> "Existing links" are defined by the class schema of the application.
>>
>> which is a bit rigid ... you can't do tricks like finding the gaps in a
>> series of entries (see other thread) by doing a self join. One can argue
Carl> that
>> this is hack, but in practice, such hack often save the day, in day-to-day
>> data management.
Carl> Again hack against theory.
Carl> I have said this before earlier in this thread: If you need a rigid Carl> system for day-to-day data management to maintain rather flat Carl> inheritance hierarchies and lots of simple records: Relational Carl> databases are probably the best choice.
I object to the 'rigid' and 'flat'; the relational systems I have worked with are anything but rigid or flat.
Carl> I don't disagree with your "every-day-best-hack-policies" at all. I have Carl> used them in the past myself but lots of code became less and less Carl> maintainable.
That can be a problem, but the alternative is to keep migrating an object system to the changing needs, or to write cumbersome loops that are had for free using a RDBMS.
Carl> - Relational databases need multiple tables to represent inheritance Carl> hierarchies. Queries tend to get lots and lots of joins. The access Carl> pattern Carl> becomes very difficult to understand.
>>
>> you don't always _have_ to build complex joins if you don't want to; you
Carl> have
>> the _possibility_, which brings great extra speed (far fewer round-trips)
>> and extra power (sophisticated queries and aggregates functions).
Carl> This is another "best-hack" against "clean-object-model" argument.
Why ? Using relational calculus, you 'massage' the relevant (parts of) entities into a set of data that is of interest to the business logic, in the most optimal way. The massaged (parts of) entities need not correspond to any proper object type known to the system, but that is irrelevant. The end result of a query or transaction will be something that is then expressed (or expressible) as objects. Relations expose every nook and cranny of the data, in the most general way; sometimes this is simply needed.
Carl> - With object databases you can query against a single object class, Carl> no Carl> matter how deep the inheritance hierarchy is.
>>
>> So you can with RDBMSs, at the expense of expressiveness.
Carl> I don't understand how.
Carl> How many tables would you use to represent an inheritance hierarchy of the Carl> following? Carl> - person Carl> - employee Carl> - manager
Depends, but usually three. This can be 'horizontal' (only tables for leaf-classes; query super-classes through a UNION of a restriction of them) or vertical (each class its own table, with sub-classes having just the bits that are 'new', + a foreign key that is also a primary key; query sub-classes through a join of super- and sub-table).
Carl> Using two paradigms makes an application more complex.
>> But this presupposes that you can treat the data and the 'business logic'
>> as one unit, which is highly debatable. It does make sense to decouple
>> storage and business logic. Application object attempt to roll the data
>> and the logic into one, which is useful for the application programmers,
>> but you loose a substantial amount of data manageability (malleability) by
>> insisting that data adhere to the objects.
Carl> Relational people always come up with this argument, questioning if Carl> coupling of data and logic makes sense. I have never understood it.
Carl> Object classes provide two perfectly separatable types of information: Carl> - members define the data structure Carl> - methods define behaviour Carl> Where is the problem of separating members and methods, should you everCarl> wish to do so? You simply delete all methods and reuse the data Carl> structure for another task.
That's not the point: the tables and primary/foreign keys guarantee you that you can query the most bizarre combinations of parts of entities, producing things that don't have to be any particular object. This generality is lacking in OODBMSs. So the 'separation' aspect is that RDMBSs do one thing (storing and querying), and do it well (i.e. in a very general way).
>> arguably, because there is code to maintain. But several writers have
>> attested the nightmare (or impossibility) of schema evolution using an
>> OODBMS, so the trade apears to be ease of application development
>> vs. adaptability of schema
>>
>> or put negatively,
>>
>> rigidity of schema vs. having to maintain a mapping
Carl> Maintainability is a matter of the implementation. Schema evolution is
Carl> not a problem inherent to object datases in principle.
I'm not so sure, because objects don't have the closure property. I.e.,
products, projection, set operations and restrictions of relations are again
relations. This symmetry is not available for objects.
Carl> - more implementation work = higher cost
>>
>> no: you keep the data management more isolated (orthogonal) from the
>> application logic, which makes the data more usuable in the future.
Carl> No: Reuse your object structures with a new logic if you wish.
So what happens if a particular relationship changes from a one-to-many relationship into a many-to-many relationship ? You will have to update your object structure, not matter what (both in the case of an OODBMS as well as in the RDBMS + OR layer).
Carl> For the future we want a query language with the same power (the original Carl> point of this thread, if you remember).
OK, we seem to have drifted. I have to say the original OQL did look very promising (simplyfying many relational queries as well).
Carl> I don't understand your
Carl> round-trip-problem. This is dependant on the implementation.
in theory yes (if you are clever enough to do query transformations), but in practice no. In practice, complicated queries end up being done by explicit pointer chasing, because that's the only thing available. Apart from the inability to do convenient aggregations over the data, you loose the ability to work with sets of things ('things' not necessarily being objects). Compared to that, the round-trip problem is indeed minor.
>> No, of course not: you have one object layer, and all the application code
>> uses just that. You only have to adjust the object-relational layer.
Carl> Agreed, if an OR layer is used.
Carl> OR layers produce lots of memory and performance overhead.
no, not necessarily. In fact, frequently the OR layer is used to add some caching, improving the speed relative to the RDBMS. I will conceed that an OODBMS (which usually also does sophisticated chaching, it seems) has an advantage here.
>> And BTW, much of this can be minimized by having the object-relational
Carl> layer
>> access the RDBMS mostly thru relational views. Views offer a device to
Carl> avoid
>> changes in the schema (like (de)normalization, or schema evolution)
Carl> affecting
>> application code; they can buy you precious convenience and time, which is
>> simply unavailable in an OODBMS system.
Carl> As far as I am informed, views still are very slow.
Carl> Relational databases have only just started to allow indices on views.
Or do you mean updateable join views ? They have indeed only recently (Oracle 7.5.4 I believe, ... came out 1995 or so) become available, and not universally.
Or do you mean indexes on functions of column values (e.g., indexing case-insenstively)? They are in fact trivial to implement, but not very widely available.
Cheers,
Philip
-- If you have a procedure with 10 parameters, you probably missed some. (Kraulis) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-08 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53Received on Mon May 14 2001 - 15:42:00 CEST