Re: (long) Sniffing redo logs to maintain cache consistency?

From: Noons <nsouto_at_optusnet.com.au.nospam>
Date: 27 Feb 2003 11:19:58 GMT
Message-ID: <Xns932FE0A252F12Tokenthis@210.49.20.254>

Following up on Andrej Gabara, 27 Feb 2003:

> Could not find a good definition on the web,

Bingo! Precisely. But you think you need to "put it out of the database". Put out what, if you can't see a definition anywhere? See what I mean?

Business logic is the set of operations the application code has to perform to respond to a business transaction request.

Data access logic is the set of data constraints and data manipulation logic that defines how data interacts together, how it is stored and how it is retrieved to be used by consumers.

> My concern is on (1), because it causes objects in a cache
> to become invalid. I don't worry about (2). Also, I'm talking only
> about caching object states, not caching any calculated data.
> Some of (2) can be be done just by a simple DB query, which
> could bypass the cache.
>

Transactional model is what you want. As soon as you use the expression "object states", that is EXACTLY what you are dealing with. It's a very old field and most of its problems have been worked out now.

There are many solutions. One that is very dear to the OO brigade (but is impossible to implement) is the concept that all data can be cached and serialization should only be done at startup and shutdown. I can't even begin to describe how insane this concept really is. It used to be used at the beginning of any discussion on transactional models to describe why treansactions themselves were needed!

It has however the advantage that there is no data refresh conflict with caches. Of course, it is so impractical that it's much better to look at other techniques and forget the OO-dream upfront. Instead of wasting time chasing up dreams. That gets costly... All you have to do is make sure that the user is aware of the compromise.

An example solution is exposed below ( there are many more!). It is dramatically more efficient for your object cache and your data cache sync and it imposes very small restrictions on user friendliness. It works and is dirt easy to implement.

> Do you agree that (1) is a problem for effective caching of
> those java objects?

No, not at all. An object cache is NOT the same as a data cache.

> The problem is with stored procedures
> that update data in the tables affecting cached objects,
> without being obvious to the app server what those changes
> are when calling the stored procedure.
>

There you go again. The stored procedures do NOT update data that is in cached objects out of their own initiative. Stored procedures have no willpower of their own!

All they can do is respond to a call from Java to go and do something. That's all. You initiate the procedure call from a business object in the cache, through the DAO.

Use this transactional model:

you create your objects in your cache. In Java. Do NOT concern yourself yet with the db side of them. Just make sure you got your object model sorted out. Do NOT, I repeat, do NOT include hierarchical data dependencies into your object model. Those are data dependencies, not object!

That is by FAR the most common error in mapping object to relational. Yes, I know, all the books say that OO supports hierachical very well. True, if all it has to do is work with the 3-table-models you see in all the examples AND it doesn't have to serialise objects.

Otherwise, you are in deep caca if you try to include hierarchy handling in your object model. It will all become clear with an example.

Let's look at the example you gave:

> A user opens up a project, looks up his task,
> and sets the task state to "completed". The business logic will
> change the task's state to "completed" and updates any
> dependent tasks, setting those to "ready".

OK, there are steps here, aren't there? There is a series of steps that need to be handled by the Java logic IN the object cache. And data consequences of those steps that are handled by the data layer.

Let's assume you have a tasks table. The tasks table needs to support hierarchical dependencies. Ie, a task may have dependent tasks. But that is a data rule. We handle that by letting a FK constraint back to the same table.

So, you NEED something that will read/update/re-write/delete from that data model in a consistent manner such that it acts like a simple task table for your workflow app.

The process that does that is data access logic and has NOTHING to do with business logic. It is this process that you should be using your stored procedures for. And it should be handled (called, results verified, etc) through a Data Access Object in your DAO layer.

The steps in your use case above are handled by methods in objects cached in your Java cache.

They initiate a transaction by grabbing a transaction context out of J2EE. This will now be in action until the business code has terminated the transaction. Nothing new here.

Now, updating the task's state is trivial. The problem is when you have to "cascade" to other dependent tasks. How do you handle that WITH the Java cache?

Answer: you don't. You let the database handle that. It's already defined there, you don't need to duplicate it in your objects!

In fact, if you did, you'd have to duplicate the ENTIRE data model in your Java objects, and cache ALL data at ALL times. I can GUARANTEE you such an application would NEVER scale to 10000 users, because hardware that can cost-efficiently handle that sort of cache does not exist. Today or in the near or far future.

So, stop right there: all you are doing is buying yourself a bucket-load of problems in trying to convince someone that their Tera-byte db has to live in memory all the time. You will NOT manage to do it, believe me!

OK, so how do you handle the problem?

Let's follow the steps of a possible solution.

User asks for a task to be marked as complete.

Your object code does whatever it has to do to make sure the screen shows that task as complete. And it eventually calls the DAO to go and make a change to the task table in the db. The DAO now calls the stored procedure to go and change the task to complete. It passes to it the data for the table that populated the object instance in the object cache.

The first thing the stored procedure does is compare the timestamps of the data in the object cache (obtained from the db last time the data was read into the object cache) and the timestamp of the row in the db now. If they are different, the update is IMMEDIATELY REJECTED and the DAO passes an "invalidate object" back to the cached object.

Which now reacts by sending a message to the user that "the data in your screen is stale, please re-query". User obediently re-queries and all is well, he gets the real data now. The task he wanted to complete is now complete, so he doesn't have to do anything, just go home. (not really, but you get the picture)

If the timestamps are the same, the code in the stored procedure marks the task as complete and updates the timestamp. AND it marks any other dependent tasks as complete as well, with changes to THEIR timestamps. Ie, it cascades. This is all doens in the db, by the stored procedure, with no intervening JDBC anyehere.

When finished, it returns to the DAO a status saying: "I have cascaded!".
The DAO returns that to the object in cache and a method somewhere in there is activated to send a little blink to the user that "this has caused dependent tasks to be changed". Just a nice warning. Possibilities in there too for other handling, but I won't go into it now.

Now the user keeps on working and eventually selects one of the tasks that was cached in the object cache, was changed in the db as a result of a cascade, but the object cache knows nothing about it.

He then tries to set it to complete. Object code calls the DAO, this one calls the stored proc, this one checks the timestamps and BANG!, we are at step one of this little exercise above.

See? Bingo, there is your object cache to data cache synch done on the cheap, efficiently and with minimal impact to the user.

> For a business logic case like (2) a stored procedure would
> probably outperform Java, because the stored procedure is
> always closer to the data it is accessing. And a lot of
> those accesses can be multiple complex queries.

It will ALWAYS outperform Java, by several orders of magnitude. For many reasons that don't come into here, but mainly to do with how JDBC works and J2EE performs I/O via EJB's.

> (1) it may not be, because of cache coherency. But if most
> objects are cached in the app server and are valid, then
> Java for case (2) could also outperform PL/SQL;

Of course! You wouldn't even GO to the db if the objects were ALWAYS in the cache, would you?

You will find however that in the real world and with 10000 users on the system, the chances of you having sufficient hardware that will cache "most objects" like you say above is very remote! And all you need is one to not be cached and all crumbles...

> model and not the data model. That is why I favor a model
> where most business logic is implemented in Java and none
> or very little in PL/SQL.

One thing should not have anything to do with the other, as I hope I have explained.

And a model that needs all object instances to be permanently in cache in order for it to be workable is to me something that is seriously flawed.

Not to use the correct expression, which should be "deranged"! That is the problem with the "everything in Java" model: it simply cannot work with real world hardware and real world volumes.

Not to mention that little problem called TCO: there is no way you can convince anyone to spend the moolah for that sort of "scalable" solution...

Of course, it works PERFECLY in the 3-table examples with two rows each. That is not however the real world, I'm afraid!

> To be able to take advantage of
> an effective java object cache, you have to make good
> decisions of where you put the business logic.
>

But you also have to define and separate what is business logic and what is data access logic. There is NO WAY in the world you can reliably scale a model that relies on having an entire database in memory! That is just "Java-guru lah-lah-land".

It's total bull: no commercial software/hardware combo ANYWHERE operates that way and is scalable to any significant number of users while being reliable, resilient and cost effective.

Don't waste your time with snake-oil solutions: they have the bad habit of turning around and biting your hip-pocket. EVERY SINGLE TIME!

-- 
Cheers
Nuno Souto
nsouto_at_optusnet.com.au.nospam

Received on Thu Feb 27 2003 - 05:19:58 CST