Re: Sniffing redo logs to maintain cache consistency?

From: Andrej Gabara <andrej_at_kintana.com>
Date: 23 Feb 2003 15:15:21 -0800
Message-ID: <11a3a163.0302231515.4c1db66d@posting.google.com>

Noons wrote:

> I must apologize for putting it so bluntly, but
> having been on the receiving end last year of
> a similar dose of "enlightened" Java design,
> it's stronger than me...   :D

I don't mind if you are blunt. If you tell me that my idea is stupid and show me why, I owe you. What were some of those "enlighted" but stupid java designs?

First, I'm not a java programmer who thinks he knows everything better. Otherwise I would not ask for feedback in this newsgroup. I was asking to get feedback.

Maybe I should have spent more time on explaining what my objective was and why I got there. My idea may be stupid, but my problems are real. So, let me start with the problems.

We currently have an application server written in Java, but most of the business logic is in PL/SQL (workflow engine, for example). Our goal is to enhance our architecture such that we can perform and scale better. Our performance tests have shown very high CPU usage of Oracle as well as high JDBC overhead. High cpu because most of our business logic is in PL/SQL, and high JDBC overhead because we have to ship lots of data from database to app server. (This architecture was not designed by Java programmers, btw.)

Our app server runs an http server as well as servlet engine and communicates to the database. So the tiers are:

CLIENT --(http)--> APP SERVER --(jdbc_--> ORACLE RDBMS.

To scale, we can configure multiple app servers to run in a cluster. That helps us scale for http request processing, but still lots goes through the database.

To improve performance, we started to cache some heavy-weight objects and use validation queries when accessing them. The validation queries are faster than loading the entity from the database, so that is not too bad. However, because of this, our object model (or data model) contains "fat" entities. A small change in an entity invalidates a fat object, and this results in a few and heavy JDBC queries.

Now we're at a point where we need to figure out what changes we must do in order to be able to scale. Also, our object model pretty much reflects our data model one-to-one, so it is not object oriented, and is not very close to the real world we're trying to model. Our architecture is very close to the session facade design patter in the J2EE world
(bunch of session beans and lots of plain value objects).

Some of the goals I had in mind were:

(1) To reduce the load on the database (for scalability and

performance) we want to move business logic from the database to the app server.

(2) To have an object model that closer represents the

real world, start out with a java object and then figure out how to map it to a db; (instead of starting out with a relational data model and then mapping it to java).

(3) Use an O/R mapper that is able to automatically persist

the object --> improve developer productivity.

(4) A finer-grained object model (less invalidation when

something changed).

(5) Validation queries to keep cache consistent are

expensive, so avoid them. Would be nice if database would let us know what changed.

(6) Reduce JDBC overhead, most of it due to network latency.

Even if Oracle has a record in its memory cache, it is expensive to ship that over the network to the app server.

(7) The object cache in the app server should cache real

world java objects, not database records. This also avoids the extra time it takes to convert records into a java business object. The closer the cached objects are to what they are being used for, the better.

I could list more, but those are the main ones. The main point I want to make is that caching java objects is very critical when implementing business logic on the java app server. You don't want to go to the database to load data that you need to execute the business logic all the time. And validation queries don't help if you have a fine grained object model (they pretty much turn off caching).

So, ideally, we should change our architecture such that all changes to data should be done in the app server, in which case we know what objects become invalid, and cache consistency is simple.

However, we have so much business logic in PL/SQL that it will take us a very long time to convert. Probably more time than we can affort. To make matters worse, we have customers who like to customize some functionality based on triggers and PL/SQL. So there are issues with the feasability of implementing such an idea architecture (actually, it could turn out that this architecture is not ideal, but that's what some people claim... and I bought it).

Therefore, I was wondering if we can implement this new architecture gracefully, without having to convert everything at once. But, as long as we have business logic in the database, it is very difficult to cache java objects and keep them valid. At least very difficult to do this such that it performs well.

That's why I was wondering if it is possible to have the database tell us when something changed instead of having validation queries (ping database)

An architecture that caches java objects but also has business logic in the database doesn't scale when it requires validation queries. If the database could push change information to the app server (even when it only consists of invalidation data, such as table name and pk), then having business logic in the database and a java object cache in the app server seems possible.

Marc Fleury (see below) states "Cache is king". You won't get a J2EE application server using entity beans perform and scale when you don't cache. As long as you implement all business logic in the J2EE app server, and don't have PL/SQL changing data behind the cache's back, then a high performance cache is possible. If you have business logic in the database changing data, then it is very difficult to keep the cache valid. Scott Ambler (1) states other reasons for not using stored procedures to implement business logic. But he doens't address caching...

Tim Gorman (also see below) wrote an article on high-availability and scalability. In his article he said that distributed databases had performance problems because of network latency. I think that a java app server and database also have a problem with network latency. It doesn't matter how efficient Oracle is in caching a record in memory, if it has to send this record to the app server over a network. The network will probably kill performance, as much as disk access would.

But, maybe I'm all wrong. Maybe business logic should be in the database, and to scale you can have your customers purchase Oracle Parallel Server licenses and kick ass hardware to run Oracle ($$$). Maybe I'm also prejudiced because I'm a java guy and not a db guy, or that I prefer to code in Java rather than PL/SQL. What should an ideal architecture look like? Business logic in PL/SQL or java, or both? How critical is caching in the app server? What do you need to think of to have efficient caching? What do you need to do in order to scale to 10,000 or more users? What if your current architecture implements lots of business logic in PL/SQL, does that imply a full rewrite if you want to change to a business-logic-in-java methodology?

If you have answers to those let me know. Also, feel free to say that some of my arguments are stupid. I can take that. But please also say why so I can learn. Trying to get me fired (as koert54 suggested) may be valid, but it still leaves me clueless. Please don't.

Do you understand now why I may have asked such a stupid question?

Thank's
Andrej

References:

(1) Tim Gorman, "High-availibility and scalability for your database"

(http://www.trutek.com/HighAvail&Scalability.pdf)

On distribured databases: "Also, the expected gains in performance were offset by the performance impact of network latency on each transaction. ... The rise of transactional replication in the mid-1990s was a direct result in the shortcomings of a distributed database. ... Nowadays the concept [of a distributed database] has been completely discredited and and should be avoided like a bad Keanu Reeves movie.

(2) Marc Fleury, "Why I love EJBs" (http://www.jboss.org/blue.pdf)

Marc Fleury, the head architect of JBoss, an open-source J2EE server, addresses caching in an EJB server. His main opinion is "cache is king".

"What the numbers say is that, for most applications, the time spent in the actual container layers is ridiculously small compared to the RPC invocation, i.e. the network invocation of the EJB, and the high amount of time you can waste in a JDBC database driver. The culprit is the old standby, serialization."

"How much in-memory work you do equals how little serialization you do Serialization = 1/cache-hits. You are always up against serialization in JDBC and RPC, local caches of data can minimize the usage of the both, it is that simple. No ifs and buts."

[On EJB cache options A, B, and C] "Option A says you never refresh the cache which is kind of dumb and Option B says you always refresh the cache which is just as dumb, as it means there is no cache. I still don't know what good option C serves really..."

[Then thinks about a better cache option that JBoss could provide in the future, such as a smart option A]

"... 'smart Option A' where you manually control the validity of the cache and assume, until it is invalidated, that the data is good; yields tremendous speed increases for a large array of use cases."

(3) Scott Ambler, "Mapping objects to relational databases"

(http://www.ambysoft.com/mappingObjects.pdf)

"There are, however, several reasons why you don't want to use stored procedures when mapping objects to relational databases. First, the server can quickly become a bottleneck using this approach. You really need to have your act together when moving functionality onto your server - a simple stored procedure can bring the server to it knees if it is invoked often enough."

"The bottom line is that stored procedures are little better than a quick hack used to solve your short-term problems." Received on Sun Feb 23 2003 - 17:15:21 CST