Re: Mixing OO and DB

From: Patrick May <pjm_at_spe.com>
Date: Tue, 04 Mar 2008 16:52:23 -0500
Message-ID: <m21w6qkvhk.fsf_at_spe.com>


frebe <frebe73_at_gmail.com> writes:
> On 2 Mar, 20:24, Patrick May <p..._at_spe.com> wrote:
>> >> Denormalization for performance is just such a schema
>> >> change. The data supported by the schema doesn't change, only
>> >> the format. If you have SQL littering your application code,
>> >> all of that has to be changed and retested. If you have an O-R
>> >> mapping layer, for example, only the mapping code needs to
>> >> change.
>>
>> > Still no example. I was asking for the schema before and after
>> > such change. Then we could evaluate if the application code would
>> > have to change too, or not.
>>
>> I'm not sure why you need an example here. Have you never
>> seen denormaliztion done for performance reasons? Do you not
>> realize the the entire point of doing so is to reduce the
>> bottleneck posed by the database without changing the external
>> behavior of the business logic that uses the database?
>>
>> One common denormalization technique is to eliminate a join by
>> adding columns to one table that replicate data in another table
>> that has a one-to-many relationship with the table being
>> denormalized. The business logic shouldn't have to change just
>> because the schema does.
>
> Since you keep the original column, no existing SQL statements would
> break. No existing business logic has to change. Only the code that
> want to use the new redundant column need to change.

     Exactly.

> Wrapping every SQL statement in a function would not give any
> advantage at all.

     Decoupling the SQL statements related to the now denormalized schema from the application code that invokes them has significant value. The change can be made and tested in isolation and the application itself doesn't have to change at all. When the new version of the component containing the SQL code is deployed, the application simply runs faster.

> One could also argue whether your denormalization would gain any
> performance at all. Why don't you publish the schema, indexes, query
> and let the guys from c.d.t, evaluate the possible performance gain.

     Because that has nothing to do with the core issue we're discussing. Decoupling components that change for different reasons and at different rates is simply good practice, whether those components are business services, database schemas, network protocols, GUIs, or anything else. Coupling components that change at different rates and for different reasons imposes maintenance costs and makes systems more difficult to enhance.

     Your requests for specific schema examples are a red herring that would take this conversation down a rathole of different design alternatives. The point is that, in practice, database schemas and application logic (and many other components) do change at different rates and for different reasons (sometimes they change for the same reasons, of course, but not always). Decoupling such components reduces the cost and risk of maintenance, testing, and system enhancement.

>> >> Changes to the schema to support new regulatory
>> >> requirements are not uncommon. Sometimes these are simple
>> >> additions, but they can also involve splitting or, less
>> >> commonly, merging existing columns. Changing ordinality is
>> >> another not infrequent change; an existing application can
>> >> continue to function by using the primary or highest priority
>> >> element on the many side of the relationship but a new
>> >> application or component can take advantage of the full set.
>>
>> > Please give one example (schema before and after).
>>
>> Again, why?
>
> Because when you finally show examples in denormalization scenario,
> it was obvious to tell that wrapping SQL statements doesn't help.

     That's not the case. Decoupling the application logic and database schema makes perfect sense for the reasons above.

>> Have you never experienced these kinds of changes? They happen
>> often enough in my, and others, experience that the decoupling
>> provided by OO techniques has proven useful. Despite what some
>> relational proponents think, developers don't come up with these
>> approaches because they dislike or don't understand relational
>> databases. They do it because they're trying to solve real
>> problems.
>>
>> Consider a regulatory change like Sarbanes-Oxley. Companies
>> need to add additional monitoring rules, but they have existing
>> applications in place that they don't want to change any more than
>> absolutely necessary due to the high cost of doing so. The
>> applications can be encapsulated to work without change within a
>> different workflow. That workflow requires additional information
>> in the database, in particular more detailed audit information and
>> finer grained tracking of individual steps in the process. This
>> results in schema changes but does not change the behavior of
>> existing applications.
>
> Still no schema examples....

     Still waving that red herring around. Have you or haven't you ever experienced these kinds of changes?

>> >> During development, many shops are using iterative methods
>> >> that require changes to both the schema and the application.
>> >> When there is more than one project ongoing at the same time,
>> >> insulating the application from the schema allows the new
>> >> changes to be incorporated when it makes sense for the
>> >> application team, rather than requiring all teams to change in
>> >> lock step (a sure recipe for schedule slip).
>>
>> > This is all very nice words, but at least we need some examples
>> > to verify it.
>>
>> Particular schema examples will not help.
>
> They will not help you keep making unsupported claims.

     My claims are supported by numerous arguments at a higher level of abstraction than schema details. If you want to defend the coupling of components that change at different rates and for different reasons, do so at that level.

> By the way, letting two team making unsynchnonized changes in the
> same schema, seem to be a good receipt for disaster.

     During development that can happen. It's yet another reason why decoupling the two provides value.

>> Either you've experienced this or you haven't.
>
> The question if your method is a good one or not, not if the
> scenario is experienced or not.

     No, the question is whether or not you've experienced the kinds of problems that tight coupling of logically disparate components can cause. The difficulties that arise are very real. The decoupling techniques weren't created just for the sake of coding.

>> If you haven't, you should at least be able to understand that
>> these kinds of environments do exist and these kinds of changes do
>> take place. This is not a discussion about particular schema
>> changes, it is about the value of decoupling.
>
> It is a discussion about whether wrapping SQL statements will help
> handling schema changes or not. Without knowing what kind of schema
> changes we are talking about, any evaluation of your method will be
> impossible.

     Not so. Application logic and database schemas change at different rates and for different reasons. That's an excellent reason to decouple them, especially for anyone who has had to deal with situations where they were tightly coupled.

>> This is where experience comes into play. No one I know who
>> has actually developed a large system using OO techniques would
>> ever embed SQL directly in the application logic.
>
> Since OO techniques forbid programmers from embedding SQL directly
> in the "application logic", your statement is a tautology.

     That's not the case. Embedding SQL statements in method definitions isn't forbidden, and may even be how the encapsulating components are created.

>> They've personally experienced the costs of doing so and don't want
>> to go through it again.
>
> Mainstream OOPLs has very poor support for embedding SQL, that's why
> the cost migh be high. Older non-OO languages has much better
> support for embedding SQL.

     So basically you're just suggesting that OO techniques should never be used?

>> >> >> This isn't even an issue for the systems I work on.
>>
>> >> > Why not?
>>
>> >> I've recently been working on low latency systems in the
>> >> financial services industry. We have to keep the database, and
>> >> any other disk based storage, completely out of the critical
>> >> path of the business transaction or we won't meet our
>> >> performance requirements.
>>
>> > What kind of "business transactions"?
>>
>> Most recently a large risk analytics engine. The business
>> transactions, or business use cases, if you prefer, involve
>> calculations like value at risk for portfolios containing hundreds
>> of thousands of positions. The only way to get acceptable latency
>> is to co-locate the business logic, messaging mechanism, and data
>> in the same operating system process. The only way to get
>> acceptable throughput is to run dozens of those processes in
>> parallel. Hitting a database would make the system unusable.
>
> I am happy it was some kind of simulations, since it seemed to be
> very risky making real business transactions without hitting disk
> based storage.

     This isn't a simulation, it's a component of an automated trading system that makes real money every day the market is open.

>> Another example is determining the impact of faults in a
>> telephone network. Probes return a large amount of information,
>> only some of which is pertinent. Raising alerts that make sense at
>> the business level, like "The SLA for Big Customer's video
>> conference service is about to be violated." requires several
>> levels of analysis and aggregation. That has to be done at in
>> memory speeds.
>>
>> What kind of systems do you build? Perhaps we can find some
>> common problem domains that would make this conversation more
>> productive.
>
> Currently I am developing a web based invoicing and contract
> management application, for real estate enterprises using the LAMP-
> stack. The database is obviously a very important part of this
> application, and MySQL provides excellent performance.

     "Excellent performance" is domain dependent. Is this system primarily CRUD or is there some complex business logic involved?

Sincerely,

Patrick



S P Engineering, Inc. | Large scale, mission-critical, distributed OO
                       | systems design and implementation.
          pjm_at_spe.com  | (C++, Java, Common Lisp, Jini, middleware, SOA)
Received on Tue Mar 04 2008 - 22:52:23 CET

Original text of this message