Re: Relationals vs. Objects Databases I

From: DTF User <user_at_young.epc.lmms.lmco.com>
Date: 1998/02/09
Message-ID: <34DF5546.396253BD@young.epc.lmms.lmco.com>

d39902_at_doxbox.pnl.gov wrote:

> [-- snip--]
>
> Why will an RDBMS out perform an ODBMS on a pattern match or an interval
> comparison? Is there a fundamentally different mechanism for comparing
> strings or integers that RDBMSs employ? I don't think so...
>

The mechanism of the comparison is the same and I wouldn't argue with that. Thefundamental difference is in the data access. Several ODBMS that I am familiar with

performed retrieval mainly by primary key which is object ID. Indexing mechanism was pretty weak and inconvinient for handling. Query optimization for the type of comparison I mentioned was poor.

> >For example: you have just three object classes - customers, transactions and
> >inventory.
> >Number of instances are 20000, 50000 and 5000000 respectively which I would say
> >relatively modest.
>
> Yes, that's modest by the databases I have implemented in an OODB -
> though I must admit that I'm using an OODB precisely because the size
> and complexity of data that I'm dealing with would cripple an RDBMS...
>

Let's set aside data complexity for a moment. I haven't heard of RDBMS beeingcrippled by the size of data, unless as I mentioned extremely poor database schema. Otherwise all the data warehouses with the terabytes of data build upon RDBMS engines won't be able to operate.
As of data complexity it is a double sided blade. From the one side it is certainly more laborous to map complex object model into relational database schema and I am usually taking my time to do so. The other side of it is that along the way while doing data normalization I can identify the need for indexes, data distribution, storage requirements and so on. I am yet to see problem that will cripple my database.

> [ -- snip -- ]
> Well, you don't give me much to go on here, but I'll give it a whirl.
> You'll just have to forgive me if I make schematic assumptions here.
> So here's a trivial schema that captures your description:
>
> cust (custID, name, street, city, state, zip, region)
> inv(itemID, cost, category)
> trans(custID, transID, date)
> transDetail(transID, itemID, quantity)
>
> Note that in reality you'd probably want item information separate
> from inventory and a separate table that maps items into potentially
> multiple categories. But we can at least work with this. NOTE: I
> don't consider this a very good schema - it certainly has some
> functional dependencies that are naughty.
>

They are certainly inconvinient at least. And I see that you are tryingto flatten the query by separating transaction details from the transaction. But that is OK, because I would do the same thing for the relational model.

> OK, so the query you proposed above is given by something like the
> following where I assume you want customers in the Pacific Northwest
> who bought widgets from you in December of 1997 and the total
> value of those widgets was > $50:
>
> select cust.* from cust where
> cust.region='PNW' and
> (select sum(inv.cost) from trans, transDetail, inv where
> trans.custID = cust.custID and
> transDetail.transID = trans.transID and
> transDetail.itemID = inv.itemID and
> trans.date >= '12/1/1997' and trans.date < '1/1/1998' and
> inv.category = 'widget') > 50
>

OK. You beat me on this.Your query is pretty short. It is not exactly what I had in mind but close enough.
Still we need to compare performance benchmarks.

> So yes, that's about 8 lines (could be fewer if I'd used longer lines).
> Now, how about the computational complexity of this operation? We
> can probably assume that each table has a hash index on the "***ID" key
> to optimize the joins. We should also assume a btree index on the

Here we go, - Joins. And in your query I see at least one Cartesian productbetween customers and resulting projection of the subquery. Given the amount of the data I mentioned before it will cripple ODBMS engine.

> trans table for date. If we're going to do this frequently, we'll
> want an index by cust region as well, and maybe item by category -
> though that's not likely to be of much help here since the optimizer
> would likely choose to join cust with trans, then join with transDetail
> and finally with inv (the proof is left to the reader who can consult
> any textbook for the rationale).
>

[ -- snip --< ]

> cust (name, street, city, state, zip, region, transactions)
> inv(itemID, cost, category)
> trans(cust, date, items)
>
> In cust, transactions is a list of trans. In trans, items is a
> dictionary (i.e., a collection of key/value pairs) where the inv
> is the key and the quantity is the value (e.g., {inv, quantity}
> "tuples").
>
> So here's my implementation of the query above using the DML of the
> database I have in mind (note, it trivially assumes the existence of a
> set containing all the cust objects in our database):
>
> allCustomers select: [:c | ((c region) = 'PNW') and:
> ((c transactions) detect: [:t | ((t date) >= '12/1/1997') and:
> (t date) < '1/1/1998') and:
> (((t items) inject: 0 into: [:sum:i | (i key category) = 'widget')
> ifTrue: [sum + ((i key cost) * (i value))]]) > 50)])]
>
> [ -- snip --< ]

Still the same question of query performance is standing.

> A couple of salient observations. First, I would argue that for the
> person who knows both Smalltalk and SQL it's not much more difficult
> to write one query over another. So much for the argument favoring
> the ease of writing SQL vs. OODB queries. Oh, and I should also point
> out that GemStone can execute this Smalltalk query interactively just
> like you would with an SQL interpreter, there's no need to compile a C++
> program or anything like that. Other than that, I don't think the
> compactness of one solution over another is a very persuasive argument
> when talking about the expressiveness or power of a query language.
>

I can't argue with that because my familiarity with Smalltalk is very vagueand vith the Gemstone is almost nil. It might be a very good ODBMS, but it's devotion to the Smaltalk limits its usability tremendously.

> Second, as for the performance, the big win for the OODB is that there
> are no joins.

Hey, you are contradicting yourself. You have a lot of joins, and some ofthem are bad ones, in both your queries.

> The inner-select in the SQL above is a multi-way join
> that must be executed over and over again for every customer in the PNW.

That is not true. The main advantage of the RDBMS over hierarchical databaseswas that it operates on a set, not individual records. With your second query you are trying to emulate this set operation and you have to explicitely describe your container classes like AllCustomers or transaction list, etc. With RDBMS they come as a natural feature. You don't have to describe them at all. I do not know how Gemstone does, but ObjectStore require you to compile these container classes and update your schema with them before you can use the query.

> Granted, we can optimize that some with indexes, but the OODB solution
> I gave above completely eliminates the join processing altogether.

You absolutely did not. You do have joins in the queries and you admitted it.I have underscored the word Join in your post.

> So
> to those who try to slight OODBs with the criticism that they don't do
> "joins" I say whopeee! All things being equal, I think it's relatively
> easy to make the argument that the OODB query I gave is as efficient
> as the SQL query.

It is still not proven.

> Sure, we can debate about the level of optimization
> of particular products, but in the sense of evaluating the computational
> complexity of the two solutions, they're not that much different (the
> proof, again, is left as an exercise to the reader).
>
> As for the availability of off-the-shelf report writers, there I must
> concede that RDBMS have it over OODBs hands down, no question about it.
>

The same is about all other third party and native products that were build overthe years around RDBMS implementations. My point here is: 1. ODBMS are still haven't reached the same level of maturity as RDBMS. 2. Even then ODBMS won't be able to solve all the problems and beat RDBMS to the ground. For some applications they will take over, for others they won't.

> - David Hansen
> --
> David M. Hansen | Pacific Northwest National Laboratory
> Senior Research Scientist | Information Sciences & Engineering
> dm_hansen_at_pnl.gov | PO Box 999 MS K7-22
> (509)375-2362, fax (509)375-3641 | Richland, WA 99352-0999

--

Received on Mon Feb 09 1998 - 00:00:00 CST