Re: Mixing OO and DB

From: Marshall <marshall.spight_at_gmail.com>
Date: Sun, 9 Mar 2008 13:50:07 -0700 (PDT)
Message-ID: <62fb68eb-674f-4ce4-badc-6df8289e06c0_at_e25g2000prg.googlegroups.com>


On Mar 8, 6:14 pm, Robert Martin <uncle..._at_objectmentor.com> wrote:
>

> It's not about importance. Of course the query is every bit as
> important as the rule. It's just that the query and the rule need not
> be coupled.

Unless all the rules are necessarily parameterized on the exact same set of attributes (which will allow one to write the queries to produce only those attributes) then they *are* coupled.

> e.g.
>
> slaggards = find_slaggards();
> fire(slaggards);
>
> Note that the query and the rule work nicely together, but are
> separate. The query could have been used with a different rule:
>
> slaggars = find_slaggards();
> list(slaggards);

Is the set of attributes you need to fire an employee the same as the set of attributes you need to list them? Almost certainly not. You can fire them with just their employee ids, but for a listing for a person to read, just raw ids is not going to be useful or informative.

If you want to give the slaggards all a 5% pay cut, you also need to know their current salary. The set of attributes needed for a particular use-case is highly coupled, *necessarily* coupled to the use-case.

So we can get around this with an ORM! The framework will do queries and only retrieve the ids of the relevant employees (where "relevant" is specific to the query.) Then we can do something like this:

  for (id in ids)
    get_full_employee_record(id);

Of course, performance will go in the toilet.

And in fact, even if the attribute sets *do* match up, you might still be wasting time. Many use-cases can be handled by a single update, where *nothing* needs to be retrieved.

You want to change some fields in a table, parameterized on values in that and another table? A lot of business logic comes down to that. It can be done in a single SQL statement, sent to the server in a single network round trip.

The network is a critical consideration here. Robert's Rules of Encapsulation do not come at any particular performance penalty when all the data is held in a single process, everything is random access, and function call overhead is approximately one instruction. In a networked world, in a client-server world, in the programming-for-the-datacenter world, Robert's Rules are a performance disaster. I have observed five and six orders of magnitude performance difference between the style of programming Robert advocates and just writing plain SQL.

The problem is that when the per-query or per-update overhead goes from instruction-speed to network-roundtrip-speed, performance easily becomes dominated by just that overhead. To minimize it, it is necessary to make the application's network protocol (the set of queries and updates) as high-level as possible. Breaking everything down into tiny methods is antithetical to that; it's exactly the opposite of what needs to happen. The *perfect* thing would be if you could express everything you want to happen in a *language*, and just send some of that language over the wire. Which of course is exactly what happens when you send SQL via jdbc or whatever.

Of course the above is a very pedestrian argument, and it would go away of the network was sufficiently fast. It's certainly not the *only* argument, but in the client-server case, just this argument by itself is reason enough not to go chopping up and encapsulating all your queries and updates.

Marshall Received on Sun Mar 09 2008 - 21:50:07 CET

Original text of this message