Re: Unknown SQL

From: Philip Lijnzaad <lijnzaad_at_ebi.ac.uk>
Date: Sat, 21 Jul 2001 18:02:33 GMT
Message-ID: <u7bsp43zyn.fsf_at_sol6.ebi.ac.uk>

On Mon, 7 May 2001 15:41:48 -0600,
"Jim" == Jim Melton <jim.melton_at_lmco.com> wrote:

Jim> "Philip Lijnzaad" <lijnzaad_at_ebi.ac.uk> wrote in message Jim> news:u7wv7xaycl.fsf_at_sol6.ebi.ac.uk...
>> Do you mean something like
>>
>> UPDATE Customers c
>> SET c.balance = c.balance +
>> (SELECT SUM(amount)
>> FROM Payments p
>> WHERE c.custid = p.custid
>> AND p.date > c.last_update
>> HAVING SUM(amount) > 0.0) ,
>> c.last_update = TODAY()
>> WHERE c.credit_rating < 3;
>>
>> If OODBMs offer this, excellent. I haven't seen the query capability
>> necessary for this yet, though.

Jim> At the risk of stirring up the dogs and cats, I would question the validity Jim> of this statement.

(to be honest, I haven't tested the above SQL ... it is meant to give an impression of why you can't separate querying from updating)

Jim> One of the criticisms leveled against the object
Jim> databases is that they are too tightly coupled to the application. However,
Jim> the above data model is fraught with application implementation details.

yes, the last_update column is an implementation detail. Does that make it 'fraught with'? The Customers and Payments tables can be used in completely different ways, if needed. E.g., a simple query can give you the sum of all outstanding payments for foreign customers, or whatever. It's nice not to be forced to navigate a network of pointers for finding this out.

Moreover, such queries (and updates) can be typed in interactively if needed, for exploratory purposes.

Jim> There is a customer record (in table Customers) that stores a calculated
Jim> balance. This balance is not necessarily up to date with all the financial
Jim> transactions credited to this customer, so the data is not necessarily
Jim> coherent. Sure, the customer record has a last update field, but that is
Jim> purely an implementation detail, not intrinsically part of the data model.

Jim> OO theory (forget about the database part for a minute) says that you
Jim> should model your data around your domain. This is the most stable part
Jim> of your system because the real world doesn't change nearly as fast as

Jim> algorithms or programming languages or even data storage/retrieval Jim> mechanisms.

In theory (and in practice), the data model is more stable than the 'business logic', but this does not mean that you don't have to cater for change in the data model. Also, it depends on the domain; in my field (bioinformatics), I have seen more changes in the data model than in the 'business logic' (let alone the programming languages: they have been stable for years).

And in fact, from a pragmatical point of view, the fact that RDBMSs are largely standardized, it is much easier to switch to different languages or even RDBMS vendors. Same can not be said for OODBMSs. Some argue that this is not a strength of the relational model, but rather a failure (since what? 10 years?) on the part of the OODBMS vendors to get their act together on the standardization front. Others will argue that it _is_ easier to standardize implementations of well understood technology such as relational theory, and that object oriented persistence is difficult simply because there is no overarching formal theory for objects.

Jim> So, a row in the Customers table is analogous to a Customer object. An
Jim> accessor method, get_balance() could return the precomputed, cached (if
Jim> potentially out of date) balance attribute stored with the Customer object
Jim> (just as the relational model above) or it could compute it on the fly by
Jim> accessing the transactions (both credits AND debits) associated with this
Jim> customer. The difference is that the OODBMS will not have to do the join

Many people seem to dread the join, but the worst case performance (i.e. no indexes, no, clustering, no statistics-based optimization nor denormalization) for a join is only O( n Log(n) ) with n being the sum of rows (or rather pages) of all the tables involved. With indexes (the usual case), this becomes just:

O(m+s)

(m=number of rows in result, s the number of rows in the smallest table). For some applications (e.g. multi-media) this is still too much, but for most, this is perfectly acceptable.

Jim> (WHERE c.custid = p.custid) because the transactions associated with a Jim> customer are already associated with the customer, allowing direct access.

yes, but that can be weekness as well as a strength: in the above case, if an update on one row fails, this might be due to problems with the business logic, rather than with the consistency of that one particular row. If so, giving each row its separate transaction makes things less secure. IOW, the bulk transaction gives an extra safety net for free. This is of course not a feature of RDBMSs per se, but the fact that they allow (nearly force) you to think of data as sets of things makes it very natural.

                                                                      Philip

-- 
If you have a procedure with 10 parameters, you probably missed some. (Kraulis)
-----------------------------------------------------------------------------
Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-08
+44 (0)1223 49 4639                 / Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           \ Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53

Received on Sat Jul 21 2001 - 20:02:33 CEST

This message: [ Message body ]
Next message: Philip Lijnzaad: "Re: Unknown SQL"
Previous message: Philip Lijnzaad: "Re: Unknown SQL"
Maybe in reply to: Bob Badour: "Re: Unknown SQL"
Next in thread: Philip Lijnzaad: "Re: Recursive closure"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message