Re: Mixing OO and DB

From: frebe <frebe73_at_gmail.com>
Date: Fri, 21 Mar 2008 22:42:44 -0700 (PDT)
Message-ID: <5ecdaf61-2e95-43b2-9135-005062f96856_at_z38g2000hsc.googlegroups.com>


On 22 Mar, 00:23, Patrick May <p..._at_spe.com> wrote:
> frebe <freb..._at_gmail.com> writes:
> > On 19 Mar, 22:31, Patrick May <p..._at_spe.com> wrote:
> >> frebe <freb..._at_gmail.com> writes:
> >>      That's fine, but splitting hairs from the application
> >> implementation point of view.  The application implementation sees
> >> a particular schema specification (tables, views, or some
> >> combination).
>
> > But it is still a particular logical schema.
>
>      So what?  The particular implementation of the application sees a
> particular schema.  Whatever you call it, it changes for different
> reasons than the application implementation and at different rates.

That claim is repeated many times, but still not proven.

> The schema also exposes data structures that usually differ from those
> of application implementation.  

If you use an OOPL this might be true, since objects used as data structures differs from relations. When I realized this and stopped using objects as data structures, life is so much better...

> >>      There are multiple possible sets of tables and views that can
> >> provide the same information.  Therefore Mr. Selzer's claim that
> >> the application implementation and the specific schema it uses
> >> cannot be decoupled is incorrect.  The specific schema is not
> >> "integral" to the application implementation.
>
> > Previously you have been invited to demonstrate how a general ledger
> > application can be decoupled from its general ledger schema. Unless
> > you accept this invitation, we have to assume this claim to be
> > unsupported.
>
>      That is incorrect.  Previously someone suggested that a general
> ledger could use a content management schema.  That was a ridiculous
> thing to say and completely unrelated to this discussion.

So you don't claim that a general ledger schema could be decoupled from a general ledger appliations? In that case we have an agreement.

>      To give an example of decoupling, consider a clearing and
> settlement system.  The schema holds customers, accounts, positions,
> books, trades, instruments, and other information related to the
> clearing and settlement of financial transactions.  Some of the
> business logic is best expressed as operations on directed graphs
> (eliminating cycles to create a directed acyclic graph, identifying
> liquidity bottlenecks, selecting the transactions to settle, etc.).
> These graphs need to be dynamically updated and are sufficiently large
> that recreating them for every operation is prohibitively expensive in
> terms of performance.

Do you claim that directed graphs could be represented as tuples in relations? Do you claim it doesn't exists a solution that only uses relations as data structure?

>      The model used by this application is significantly different
> from that stored in any relational schema.  Decoupling them is
> essential to the performance and maintainability of the system.

"The model used by YOUR application", I would add. How can we validate "Decoupling them is
essential to the performance and maintainability of the system"?

> >> >> Since you agree that multiple different physical schemas are
> >> >> possible, decoupling the application from any particular set of
> >> >> those is both possible and good design.
>
> >> > Multiple logical schemas are also possible. The decoupling is
> >> > done using views.
>
> >>      That's one way.  Non-relational languages provide others.
>
> > So we have an agreement that using views is a valid solution for the
> > problem?
>
>      It is one valid solution for some instances of the problem.  It
> is not universally applicable.  Do you disagree?

Not all SQL databases supports updatable views in a good way. Some products also supports views backed up by a stored procedure. I would say that views, when availible, are universally applicable.

> >> > I thought that Brians examples with coordinates was pretty
> >> > obvious, but maybe some futher clarification is needed. Suppose
> >> > you have a table for Cartesian coordinates. Now you have an
> >> > application that needs to work with polar coordinates
> >> > instead. The solution is to create an updateable view for polar
> >> > coordinates, which derives the data from the original
> >> > table. Another solution could be to create a new table using
> >> > polar coordinates and creating a view (with the same name as the
> >> > original table) providing the Caresian coordinates. The logical
> >> > schema will now contain two relations for the same data in
> >> > different format.
>
> >>      Alternatively, the data could be loaded into an object that
> >> does the conversion on demand.  The rest of the application
> >> wouldn't need to know about the database or the specific schema at
> >> all.
>
> > And the benefit with that is?
>
>      An object can convert any coordinate, without extra views or
> storage.  What is the disadvantage of this approach?  What is the
> clear benefit of using relations?

The major benefit is that you don't need to change your application at all, when you change the physical storage of the coordinates. Another benifit is that all applications (including report generators and interactive SQL) have both representations availible. Also, imagine the query below assuming the base relation is using Caresian coordinates.

select angle, distance
from coordinate
where distance < ?

How would your solution look like?

> > I will give a short example of why I think views are a better
> > solution than wrapping every SQL statement in a method.
>
> > Lets take Martins famous find_employees_eligible_for_retirment
> > example.
>
>      I'd like to hear your explanation in the context of coordinate
> conversion, as well.

See above.

> > We have two solutions, one is to create a method.
>
> > find_employees_eligible_for_retirment()
> > {
> >   return query("select id, name, birthdate from employee where
> > yeardiff(now(), birthdate) > 60)");
> > }
>
> > The corresponding view would look like:
>
> > create view employee_eligible_for_retirement as
> > select id from employee where  yeardiff(now(), birthdate) > 60
>
> > In the first example the set of attributes is fixed (id, name,
> > birthdate), and the only way to get extra attributes (or join extra
> > tables) is to make another database call for every employee.
>
> > Using a view you could easily join extra attributes without making
> > extra database round-trips and have superior performance.
>
> > select e.id, e.salary
> > from employee_eligible_for_retirement r join employee e on e.id=r.id
>
>      With a function wrapper, you could just create another SQL
> statement.  If you found yourself repeating the retirement criteria,
> you could encapsulate that, perhaps even using a view, so that it is
> shared by both functions.

In my case you would end up with:

get_eligible_for_retirment_criteria()
{

    return "yeardiff(now(), birthdate) > 60"; }

find_employees_eligible_for_retirment()
{

   return query("select id, name, birthdate from employee where " + get_eligible_for_retirement_criteria()); }

find_employees_eligible_for_retirment2() {

   return query("select id, salary from employee where " + get_eligible_for_retirement_criteria()); }

For every possible set of attributes or joins, you would have to create another method with very simple SQL statements. What do you gain with that? Why not just write

select e.id, e.salary
from employee_eligible_for_retirement r join employee e on e.id=r.id

where you need it? The complex part is already abstracted away.

>      You seem to be making the assumption that there will be a
> proliferation of these functions.  That's not the case in practice.

This answer indicates that you try to limit the number of finder methods and instead reuses the first once, with performance problems as a result. Either you create new methods for every new possible SQL query, or you reuse old ones that doesn't really fit.

> The ad-hoc querying capabilities of SQL are simply not required in
> most applications.

In my current project (invoicing and contract management for real estate business), I can count to at least 20 different select statements using the receivable table. People with poor knowledge in SQL just makes two queries:
select * from mytable
select * from mytable where id=?

and they solve the rest in their applications, with poor performance as a reslut.

>  Given that, there are some significant advantages
> to decoupling the application from the schema:
>
>      - Functions (or methods) provide descriptive names.  Many
>        enterprise database schemas are relatively complex, making the
>        SQL statements to accomplish something like reading all
>        employees eligible for retirement less than intuitive.

The same could be said about views.

>      - The application is insulated from changes to the business rules
>        maintained by the database.  If the criteria for retirement
>        eligibility changes, there is no need to change existing,
>        working, tested code that uses the function.  Only the function
>        itself needs to be regression tested.

Only the view employee_eligible_for_retirement need to change. No application has to be changed at all.

>      - The function can return the data structure required by the
>        application, which is unlikely to be the tuples returned by the
>        database query.

A select statement returns the values required by the application.

> > Lets say you want to se how many employees in a specific department
> > that are eligible for retirement, you could easily write another SQL
> > query.
>
> > select count(*)
> > from employee_eligible_for_retirement r join employee e on e.id=r.id
> > where e.department=?
>
> > Using your method, you would have to write a new method for every
> > single SQL statement, duplicating the eligible_for_retirement
> > business rule all over you "persistence layer". The other option
> > would be to use the original find_employees_eligible_for_retirment
> > method and performing post-processing having terrible performance.
>
>      Eliminating duplication is a good reason to encapsulate the
> database-specific components.  The encapsulating functions can use
> views or any other techniques required to eliminate duplication at
> that level, and the functions themselves eliminate duplication within
> and across applications.

So you agree that a view is necessary in this example? (If not, please show the OO solution for the problem above.) If you already have the view, what more duplication is there to eliminate?

> >> > On the logical level, virtually any data can be represented in a
> >> > relational form.
>
> >>      On the logical level, all Turing complete languages are
> >> equally powerful.  For particular applications, some are far more
> >> expressive and usable than others.  Similarly, the relational model
> >> is not always the most expressive for a particular application.
>
> > Can you give some examples of data that doesn't fit for the
> > relational model on the logical level?
>
>      That's the wrong question.  Can I give an example where the
> relational model is not the most expressive and usable form of the
> data?  Yes, see the clearing and settlement example above.

Why is the relational model not the most expressive and usabel form in that example?

> >> >> There is therefore a need to translate between the data
> >> >> structures, which is another good reason to decouple the
> >> >> application from the specific physical schema being used.
>
> >> > Since the application is not dealing with the physical schema, it
> >> > is not able to do that translations. It is already done in the
> >> > RDBMS.
>
> >>      That is not correct.  The application deals with a specific
> >> schema with data represented in the relational model.  That is not
> >> always the format required by the application.
>
> > What is incorrect? The fact the the applications access the logical
> > schema?
>
>      The claim that the application can't do translations is
> incorrect, on its face.  SQL isn't going to return DAGs, or even
> queues, stacks, or maps.

A DAG can obviously be represented as a set of tuples. The same applies to queues, stacks or maps, even though I can't really see why you need a low-level data structure like a map, when relations are availible.

//frebe Received on Sat Mar 22 2008 - 06:42:44 CET

Original text of this message