Re: Java stored procedure performance dilemma

From: Colin McGuire <colinandkaren_at_lycos.co.uk>
Date: 20 Oct 2002 08:38:51 -0700
Message-ID: <ab6cea37.0210200738.53047353@posting.google.com>

All, thank you for your comments. I've kept quite listening to the opinion developing over the past week and think it appropriate to make the following comments at this time.

I have not investigated using a Java native compilation option. In the next few days I will attempt to scope out the nature of Oracle/native compiled code integration (how it is achieved etc).
Despite having considerable expertise and resource, because of the nature of the problem (basically string manipulation and calculations) we have decided against using PL/SQL as the sole programming language. Our straw model (crude demonstration or proof-of-concept) used PL/SQL. Toward the end of the exercise we had all concluded that PL/SQL did not offer sufficiently rich string manipulation functionality to consider it as a serious programming language candidate for this task.
The Java can be thought of as being deployed as Java stored procedures (actually it is deployed as a cartridge but this is an extra level of complexity that doesn't need to be considered for problem resolution).
The processing has to be done server-side, not client side primarily because for the tasks to be done server side would result in appreciable and avoidable network traffic.
Our regular Java/DB guru's have performed considerable tuning optimisations but as it stands the system we have developed is far from interactive.
The bottleneck in the application is not the database functionality but our own.

The problem is very complex and I describe a very very simple overview.

We are providing additional query functionality to a very large database of protein data. We have no control of the format of this existing data and the design cannot be altered. Simply the proteins are represented in the database as strings of single letters, one letter for each amino acid. Thus, the small protein ACGACHASP represents a protein containing 9 amino acids where each letter represents a known abbreviation for an amino acid. The characters ACGACHASP are stored as VARCHAR2's, or when greater than 4000 characters, CLOB's (when < 4000 characters the CLOB is an EMPTY_CLOB, when > 4000 characters the VARCHAR2 is NULL).

ie CREATE TABLE peptides (protein VARCHAR2(4000),

                          proteinlarge CLOB
                         );

And there is another table, and this table has predefined peptide fragments for each peptide in the peptides table. Although the table below isn't the table definition, it serves to describe the problem.

CREATE TABLE peptideFragment (proteinRID ROWID, //the rowid of the peptide

                                                //in the the peptides
table
                              fragment VARCHAR2(50)  //eg. CGA
                             );

This table contains lists of recognised protein fragments, such as CGA, and the ROWID of the peptide that contains this fragment in the peptide table.

And the problem -- users make database requests to return all the peptides containing certain fragments. This worked quickly and efficiently and addressed the business requirements. This is what they use their proprietary software for, and they want to continue using it. Now they want to return a hitset for peptides containing the same fragments but if they search for two or more fragments, they want to be able to specify how close they are to each other. In otherwords, they don't want to return a peptide simply containing fragment 1 and fragment 2, but fragment 1 within, for example, 3 amino acids of fragment 2. And there are many more examples than this and I hope you can see where the need for considerable string manipulation comes in.

Our Java solution returns to the client app a recordset containing only results from the valid request. So what we do is perform a query and then process the resultset culling out records that don't match the criteria from those that do. Typically processed recordsets have only 30-40 hits returned to the client app, whereas the recordset processed has typically 300-400 hits. The bottleneck is the processing of the string data from the database query, and this key functionality is written in Java.

So, a conceptually simple problem that, because of our implementation (Java) and the inability to change the design, runs like a dog.

Colin

"Finn Ellebaek Nielsen" <_remove_fen_remove__at_changegroup.dk> wrote in message news:<3db125a3$0$97629$edfadb0f_at_dspool01.news.tele.dk>...
> Hi Colin.
>
> A few suggestions:
>
> - Have you tried using Java native compilation (NCOMP) in order to see if
> that speeds your Java stored procedures sufficiently up?
> - When doing string manipulations, do you use String or StringBuffer? You
> should use StringBuffer directly when possible since that's much more
> efficient.
> - Have you configured the JVM with sufficiently large memory pools (init.ora
> parameters JAVA_POOL_SIZE, LARGE_POOL_SIZE, JAVA_MAX_SESSIONSPACE_SIZE,
> JAVA_SOFT_SESSIONSPACE_LIMIT)?
> - Have you done internal profiling (tracing/logging messages) to determine
> which part of your code is slow?
>
> HTH.
>
> Finn
>
> "Colin McGuire" <colinandkaren_at_lycos.co.uk> wrote in message
> news:ab6cea37.0210161156.423d90ba_at_posting.google.com...
> > I am involved with small team of programmers that have just finished
> > writing a suite of Java routines that perform some grunt-job string
> > manipulation/processing :- the algorithms implemented as Java stored
> > procedures. Unfortunately, at this late stage in the play, performance
> > has become an issue and we realise we have probably made an incorrect
> > technical decision early on. Our reasons for initially choosing Java
> > as the development environment were the familiar ones but platform
> > independence should not have been one of the reasons as our code is
> > intimately tied to Oracle. During a recent technical meeting it was
> > mentioned one solution might be to rewrite (ie translate) our library
> > routines from Java to Pro/C++, the latter I believe shipped with all
> > flavours of Oracle and thus would run on every machine Oracle ran on,
> > compile to fast native code etc. We have decided on having a think
> > about this approach, weigh up the pro's and con's, and discuss again
> > from an informed perspective at the next meeting.
> >
> > I would be interested in opinions from interested parties in the
> > forum, other alternatives to our dilemma etc.
> > Thank you
> > Colin
Received on Sun Oct 20 2002 - 10:38:51 CDT