Re: (long) Sniffing redo logs to maintain cache consistency?

From: Noons <nsouto_at_optusnet.com.au.nospam>
Date: 24 Feb 2003 09:37:45 GMT
Message-ID: <Xns932CCF526651FTokenthis@210.49.20.254>

Following up on Andrej Gabara, 24 Feb 2003:

warning: long and boring techo-design stuff follows...

>
> I don't mind if you are blunt. If you tell me that my
> idea is stupid and show me why, I owe you. What were some
> of those "enlighted" but stupid java designs?

As I said, my apologies for the bluntness. Some of those enlightened designs? Here is one:

A "Java/J2EE consultant" recommending that we reduce the database calls to basic SQL and cache the rows in the app server. When we questioned why, we were told that he could then "turn on some nice caching" and have all relational joins done by Java code inside the application server, in "a nice and efficient way".
I will not even go into details of why this is as moronic as can be....

> feedback in this newsgroup. I was asking to get feedback.

Cool. I can live with that.

> We currently have an application server written in Java,
> but most of the business logic is in PL/SQL (workflow engine,
> for example).

Nothing wrong with that.

> Our goal is to enhance our architecture such
> that we can perform and scale better.

You will never ever scale better by moving your database access code and business code to Java. That I can guarantee you. However, this is my opinion and not the industry's. You are entitled to disagree with it and I won't blame you for that.

> Our performance tests
> have shown very high CPU usage of Oracle as well as high
> JDBC overhead.

Have you done ANY tuning of your Oracle and PL/SQL code AT ALL? If not, why do you think that a change in architecture is necessary? I'd say upfront scalability of your db server is so easy to achieve as to make any other solution feel like a squeeze...

> High cpu because most of our business logic is
> in PL/SQL,

Sorry, that is not a valid conclusion. Although it may indeed be a desirable thing to have.

> and high JDBC overhead because we have to ship
> lots of data from database to app server. (This architecture
> was not designed by Java programmers, btw.)

You should most definitely not have to ship lots of data to the app server. If your business rules and code are indeed on the database under PL/SQL, then what you have to do is minimize the amount of data that flows back and forth. That is done by using the array interface, object-relational mapping and a few other techniques such as bind variables and smaller objects.

All have their own naming convention in JDBC, if you are really interested or don't have a clue where to start drop me a line and I'll see what I can ship your way.

>
> Our app server runs an http server as well as servlet
> engine and communicates to the database. So the tiers are:
> CLIENT --(http)--> APP SERVER --(jdbc_--> ORACLE RDBMS.

Good. Decouple your http from the app server and you got some scalability right there. Increase the efficiency of your PL/SQL code and its own SQL and you'll see the scalability of Oracle RDBMS go through the roof. Install 9ir2 and you'll get virtually unlimited scalability of the db server. No need to change your architecture anywhere.
However, read on...

>
> To scale, we can configure multiple app servers to
> run in a cluster. That helps us scale for http request
> processing, but still lots goes through the database.

Scale the db servers? What's so hard to understand there?

> entities. A small change in an entity invalidates a fat
> object, and this results in a few and heavy JDBC queries.

Go for thin objects. You don't need them fat at all. In fact, it's highly counter-productive for a high efficiency environment.

> changes we must do in order to be able to scale. Also, our
> object model pretty much reflects our data model one-to-one,
> so it is not object oriented, and is not very close to the
> real world we're trying to model.

I'm at a loss now. The object model maps to the data model one to one and you got fat objects to communicate via JDBC. Sounds to me like you're trying to do too much OO stuff inside PL/SQL and too much data access stuff in Java. Exactly the opposite of where you want to be.

> Our architecture is very
> close to the session facade design patter in the J2EE world
> (bunch of session beans and lots of plain value objects).

Good. Nothing wrong with that. So is ours. Our guys use Jakarta struts and plain good old beans. They gave up on EJB's completely.

>
> Some of the goals I had in mind were:
>
> (1) To reduce the load on the database (for scalability and
> performance) we want to move business logic from the
> database to the app server.

Possible and one way to do it. Sounds to me like you're trying to use too much OO architecture from PL/SQL. Reduce that to a plain SQL layer with maybe some translation/validation, then use a DAO (Data Access Object) approach to access each table via JDBC from the complex objects in the app server beans. Much better way to go.

>
> (2) To have an object model that closer represents the
> real world, start out with a java object and then
> figure out how to map it to a db; (instead of starting
> out with a relational data model and then mapping
> it to java).

That is a valid approach, but make sure you don't end up with too many rows to describe object states: that is a common trap of that approach.

>
> (3) Use an O/R mapper that is able to automatically persist
> the object --> improve developer productivity.

Let's keep the focus in the real problem, OK? If your problem is scaling, the last thing you need is a "mapper" to "improve developer productivity". Those things don't make their living out of thin air: there is a very high cost in resources to get them going. If your problem is scalability, the last thing you want is a development productivity tool thrown in. They are highly costly on resources, exactly what you don't want if you have a scalability problem!

>
> (4) A finer-grained object model (less invalidation when
> something changed).

Yes. Absolutely. Keep it simple.

>
> (5) Validation queries to keep cache consistent are
> expensive, so avoid them. Would be nice if database
> would let us know what changed.

Do NOT cache data in Java. Cache object instances. There IS a difference! When you go back to the database, make sure you use an API interface to PL/SQL. For updates, make PL/SQL check the original row against the in-coming one and let it signal what is different, if any. If nothing is different, then do NOT let it fire off the UPDATE. If you do it this way, you have to code a SELECT FOR UPDATE OF before every update, so that you can compare the rows and see if they have been changed from what you had been working on. So, use that overhead to lock the row and decide if an UPDATE is really needed or not. Use a timestamp column to decide if row has been changed. Use the J2EE transactional model with the pooled connections and away you go.

>
> (6) Reduce JDBC overhead, most of it due to network latency.
> Even if Oracle has a record in its memory cache, it is
> expensive to ship that over the network to the app server.

Get a faster network and multiplex it. Use the technique in 4) above to reduce the overhead. If you want to scale to the tens of thousands, you MUST have an efficient network: no two way about it. You can't make something out of nothing.

>
> (7) The object cache in the app server should cache real
> world java objects, not database records.

I could not agree more with this.

> a java business object. The closer the cached objects are
> to what they are being used for, the better.

Precisely.

> I could list more, but those are the main ones. The main
> point I want to make is that caching java objects is very
> critical when implementing business logic on the java
> app server. You don't want to go to the database to load
> data that you need to execute the business logic all the
> time. And validation queries don't help if you have a
> fine grained object model (they pretty much turn off caching).

Absolutely.

>
> So, ideally, we should change our architecture such that
> all changes to data should be done in the app server, in which
> case we know what objects become invalid, and cache
> consistency is simple.

Or simplify your object to relational mapping so that you don't have so many heavy objects moving around everytime someone flicks a bit.

>
> However, we have so much business logic in PL/SQL that
> it will take us a very long time to convert. Probably more
> time than we can affort. To make matters worse, we have
> customers who like to customize some functionality based
> on triggers and PL/SQL. So there are issues with the
> feasability of implementing such an idea architecture (actually,
> it could turn out that this architecture is not ideal, but
> that's what some people claim... and I bought it).

Go for scaling the db server, then. And tune the darn thing!

>
> Therefore, I was wondering if we can implement this
> new architecture gracefully, without having to convert
> everything at once. But, as long as we have business logic
> in the database, it is very difficult to cache java objects
> and keep them valid. At least very difficult to do this
> such that it performs well.

Yes. Stick with the easy bits, tune the db server first.

>
> That's why I was wondering if it is possible to have the
> database tell us when something changed instead of
> having validation queries (ping database)

No. Unless you put another layer yet in top of what you already got to detect changes. That may cause problems.

>
> An architecture that caches java objects but also has
> business logic in the database doesn't scale when it
> requires validation queries. If the database could push
> change information to the app server (even when it only
> consists of invalidation data, such as table name and pk),
> then having business logic in the database and a java
> object cache in the app server seems possible.

Yes.

>
> Marc Fleury (see below) states "Cache is king". You won't
> get a J2EE application server using entity beans perform and
> scale when you don't cache. As long as you implement all
> business logic in the J2EE app server, and don't have PL/SQL
> changing data behind the cache's back, then a high performance
> cache is possible. If you have business logic in the database
> changing data, then it is very difficult to keep the cache
> valid. Scott Ambler (1) states other reasons for not
> using stored procedures to implement business logic. But he
> doens't address caching...

Scott Ambler and Marc Fleury rate right up there in lah-lah-land when it comes to understanding issues relating to performance.

First of all: entity beans and EJBs are nowadays recognized by just about everyone in the industry except those two as a *very bad* idea. Do NOT use them. See the serverside for why.

Second: Define business logic before you start placing it. If you read all their stuff, there is not ANYWHERE a proper definition of what constitutes "business logic". It's just thrown in as a nice expression all over the place.

Third: if you are implementing cache yourself in your app server, what you need is a better app server, not to re-invent the wheel. There are HEAPS of app servers on the market already with an object cache. Use them.

Of course object caches are a good idea (ANY cache is a good idea, we don't need these "gurus" "discovering" it, it's been found out only 45 years ago!!!). But do not put them in the wrong spot for the wrong reasons.

>
> Tim Gorman (also see below) wrote an article on high-availability
> and scalability. In his article he said that distributed databases
> had performance problems because of network latency.

Oh, so scalability achieved through multiple app servers in a network doesn't suffer from latency as well? Must be a new magic...

> I think
> that a java app server and database also have a problem with
> network latency.

Of course they do. ANYTHING that touches a network has a latency problem. ANYTHING! So what?

> It doesn't matter how efficient Oracle is in
> caching a record in memory, if it has to send this record to
> the app server over a network. The network will probably kill
> performance, as much as disk access would.

It doesn't matter how efficient a Java object cache is, if it has to send it across a network to store it persistently in a database, it will kill performance. Serialisation. What else is new?

Instead of re-inventing the world, wouldn't it be more efficient to find out how to work well within the constraints of the existing one?

>
> But, maybe I'm all wrong. Maybe business logic should be in
> the database, and to scale you can have your customers purchase
> Oracle Parallel Server licenses and kick ass hardware to run
> Oracle ($$$). Maybe I'm also prejudiced because I'm a java guy
> and not a db guy, or that I prefer to code in Java rather than
> PL/SQL.
No, you are very right. I just think you've been listening to the right things for the wrong reasons.

Let's look at your problem. You need to tune that PL/SQL code to make it as efficient as possible. Get someone that understands PL/SQL and Oracle to have a look at your schema, your SQL, your db server and your db statistics. I'll bet you anything you want if they are any good they'll find at least an order of magnitude improvement possible with relatively minor changes. THEN, you start looking at RAC and db scalability!

>
> What should an ideal architecture look like?

Like the Opera House in Sydney! (Sorry, I have a strong aversion to the term "architecture" being used to describe "design and analysis")

> Business logic in
> PL/SQL or java, or both?

Both. Define YOUR business logic. See which part is dependent on object layer, which is dependent on data layer. Move it accordingly.

> How critical is caching in the app server?

Very. For objects only.

> What do you need to think of to have efficient caching?

Heaps. It's a very complex field that usually requires close integration with J2EE, JRE, and DB code itself. That's why it is more cost-effective to get something like 9iAS and 9iDB together: it solves the problem a lot faster and with much less pain.

> What do you need to do in order to scale to 10,000 or more users?

A very good app server, with lots of shared, pooled connections. A very good design, characterized by VERY SIMPLE transaction design. You simply CANNOT have a complex design that is scalable. That only happens in the Scott Ambler books with the three entity schema...

> What if your current architecture implements lots of business logic
> in PL/SQL, does that imply a full rewrite if you want to change
> to a business-logic-in-java methodology?

Yes. So, if you don't want to do that or think it's too costly, then hire a good PL/SQL/Oracle person to fix the problem where it exists now, instead of just moving the problem around. Costs even out in the end.

>
> If you have answers to those let me know. Also, feel free to say
> that some of my arguments are stupid. I can take that. But please
> also say why so I can learn. Trying to get me fired (as koert54
> suggested) may be valid, but it still leaves me clueless. Please
> don't.

Very good points. I'm sorry once again for being so blunt at the start. I hope I've redeemed myself somewhat. Any additional help you need I'll be only too glad to contribute.

> (1) Tim Gorman, "High-availibility and scalability for your database"
> (http://www.trutek.com/HighAvail&Scalability.pdf)
>
> On distribured databases: "Also, the expected gains in performance
> were offset by the performance impact of network latency on each
> transaction. ... The rise of transactional replication in the
> mid-1990s was a direct result in the shortcomings of a distributed
> database. ... Nowadays the concept [of a distributed database] has
> been completely discredited and and should be avoided like a bad
> Keanu Reeves movie.

What Gorman forgets to detail is that not only is distributed a bad idea in RDBMS's, but also with appservers.

Throw in your cache into your app server willy-nilly, then try to scale it and I can promise you right now as big a problem as with any distributed database. For EXACTLY the same reasons. The problem is NOT with databases, it's with the WRONG way to distribute.

>
> (2) Marc Fleury, "Why I love EJBs" (http://www.jboss.org/blue.pdf)
>
> Marc Fleury, the head architect of JBoss, an open-source J2EE
> server, addresses caching in an EJB server. His main opinion is
> "cache is king".

How very right. "Caching what" is of course conveniently omitted.

> do Serialization = 1/cache-hits. You are always up against
> serialization in JDBC and RPC, local caches of data can minimize the
> usage of the both, it is that simple. No ifs and buts."

So, instead of fixing the reason for the need to serialise (the deranged nature of EJBs and entity beans), what we need to do is re-invent the entire edifice of data processing. How quaint. How cheap...

> [On EJB cache options A, B, and C] "Option A says you never refresh
> the cache which is kind of dumb and Option B says you always refresh
> the cache which is just as dumb, as it means there is no cache. I
> still don't know what good option C serves really..."
>
> [Then thinks about a better cache option that JBoss could provide in
> the future, such as a smart option A]

QED. How deranged...

Go to the serverside and see all the discussions there about the crap that EJBs and entity beans really are. You're much better off using other techniques. You do NOT need EJB's and entity beans to scale and have good performance. A common error, perpetuated by Sun and IBM
(both hardware vendors. hint hint?)

>
> (3) Scott Ambler, "Mapping objects to relational databases"
> (http://www.ambysoft.com/mappingObjects.pdf)
>
> "There are, however, several reasons why you don't want to use
> stored procedures when mapping objects to relational databases.
> First, the server can quickly become a bottleneck using this
> approach. You really need to have your act together when moving
> functionality onto your server - a simple stored procedure can bring
> the server to it knees if it is invoked often enough."
>
> "The bottom line is that stored procedures are little better than a
> quick hack used to solve your short-term problems."

We had a monster discussion about precisely this little "pearl of wisdom" about 18 months ago. After much to-and-fro, it turns out now that the Java guys themselves are asking us to store some of the "business" rules in PL/SQL code in the database. This is nothing but deranged drivel from Ambler. He doesn't have the FOGGIEST on how stored procedures work in Oracle and how to take best advantage of them in high volume environments.

The thing you have to remember here is that in the most fundamental analysis, an app server and a db server work very much like a traditional two-tier client/server combination. The problem of high volume throughput in such environments was solved YEARS ago. It's only Ambler's total ignorance of this fact that makes him rattle on like this. Plus of course the little detail that he knows exactly jack about Oracle. His only contact with stored procedures is through the deranged SQL Server crap. No wonder...

'nyways, it's late and I've been on the soap box for far too long. Next?

-- 
Cheers
Nuno Souto
nsouto_at_optusnet.com.au.nospam

Received on Mon Feb 24 2003 - 03:37:45 CST