Re: java stored procedures fast, but slow when called as SQL function

From: Peter J. Holzer <hjp-usenet_at_hjp.at>
Date: Sat, 19 Apr 2003 14:03:42 +0200
Message-ID: <slrnba2eou.ptu.hjp-usenet@teal.hjp.at>

On 2003-04-18 14:06, Noons <wizofoz2k_at_yahoo.com.au.nospam> wrote:
> Following up on Peter J. Holzer, 18 Apr 2003:
>
>

>>> A JVM program MUST have the JVM started BEFORE it can execute.
>> Same for VBA. Code not yet loaded cannot be executed :-)

>
>
> Not at all the same. A VBA program does NOT need the VBA
> DLL interpreter running before it itself can start. In fact,
> it is the VBA program that loads the interpreter DLL.

Or so it seems. I am not a Windows guy, so I don't know what VBA really does, but there are basically two ways to do it:

The VBA compiler produces an EXE file (the extension doesn't have to be .EXE, it just has to be recognized as a native executable) which contains the interpreter (which may be in a shared library) and the byte code. When you execute the file, the interpreter is started and executes the byte code. This is an old trick (I have seen "basic compilers" do this at least 15 years ago) and it is also used e.g., for self-extracting zip files.
A registry entry links the extension of VBA files to the VBA interpreter. When a VBA file is invoked, the OS knows that it has to start the VBA interpreter first. This is what happens when you double-click on a word file, for example (the interpreter in this case is MS-Word). (I don't know if this works outside of the GUI, though)

In both cases the VBA program proper (i.e., the byte code) only starts to run after the interpreter is loaded. (it can't be any other way if you think about it)

> A Java program will NOT start unless the JVM is running.
> And it is the JVM that loads and starts the main class.

Both tricks would be possible with Java, too. If Sun chose not implement them in a default installation of the JRE, that is a minor inconvenience for the developer (he has to provide an extra batch file or a few registry settings), but it has no influence on the working of the interpreter itself. The Linux kernel for example, contains a bit of code to recognize class files and invoke the java interpreter (just like it invokes the perl interpreter when somebody tries to execute a file starting with "#!/usr/bin/perl"). It's turned off by default and usually regarded as a curiosity. Providing a one line shell script is regarded as an acceptable burden and works on any Unix.

>>> You can't 
>>> start ANOTHER program in that same JVM address space unless the first 
>>> program does that for you. 
>> 
>> Not sure what that is supposed to mean nor whether that's any different
>> from VBA.
>>

>
> Ever tried to start batch programs from inside a Java
> program?

Yes.

> Not easy, eh?

Depends. If you want to communicate with them via stdin/stdout, that's actually easier than in C (don't know about VB). My Java is very rusty (never done much Java programming and nothing in the last 2 or 3 years), but it just took me about 15 minutes (including finding a machine which has a java compiler installed and reading the docs for the Runtime, Process and InputStream classes) to write a little program which invokes an external command (specified on the command line) reads its stdout and dumps it in decimal to stdout.

It may be a bit more difficult to invoke the subprocess with its stdin/stdout/stderr the same as the process (which is default on UNIX), but I guess another 15 minutes of rummaging through the docs would tell me how.

>> Every interpreter does that. An interpreter takes an instruction stream
>> in a particular instruction set and executes it according to the rules
>> for that instruction set. That is "completely isolated" from the OS. The
>> interpreted program can never do anything except through the
>> interpreter. The interpreter can implement instructions such as "open
>> file", "execute program" or "call function X in DLL Y", but these have
>> to implemented explicitely - that's not something "just an interpreter"
>> comes naturally with. 
>> The authors of the Java class library chose to
>> implement these functions with an added layer of permission checking
>> (the "sandbox"). That makes Java a bit more complicated, but it doesn't
>> change the fact that it is an interpreter nor makes it it any more
>> "virtual machine-like".

>
>
> An interpreter takes indeed an instruction stream in a given
> instruction set AND executes it according to the rules of the
> OS it is running in. It doesn't NEED to do anything else
> for the program.

No. The OS doesn't have anything to do with the semantics of the byte code. Whether the instruction 42 means "create a new integer variable" or "pop two floating point values from the stack, divide them, and push the result" or "invoke a routine in a shared library" is completely up to the interpreter. That doesn't change if you change the OS. The OS enters into the equation as soon as you ask for its services. If the interpreter (on behalf of the interpreted program) tries to open a file, the OS will enforce its rules for file names, permissions, etc. But that (usually) is outside of the specification of the interpreter.

> A VM does a lot more than that. In JVM's case, the clearest
> example is the garbage collection. Which an interpreter doesn't
> have to do, it can be done at OS malloc level.

An interpreter doesn't even have to use malloc, if the languege it interpretes doesn't contain dynamic memory allocation. However, the interpreters for most high-level languages I can think of at the moment do garbage collection. Even the BASIC interpreters I used on home computers in the 1980s did collect garbage. The one exception is the UCSD-p-machine, which didn't. And the p-machine was most definitely a "virtual machine" - an emulation of a (hypothetical) processor. (As a side note, when I first read about Java, my first thought was "Oh, cool! They reinvented the p-machine from the 1970's and are hyping it as something new.")

> Sun has chosen to make it part of the "virtual machine"
> environment in which Java programs run. That way they can
> provide garbage collection even if the OS they are running
> under doesn't have that ability.

The ability to do garbage collection has a lot to do with the language and little with the OS. In PL/SQL and BASIC (the original BASIC from the 1960s, not VB) it is simple, because there is no explicit memory management in the language and there are no recursive data structures. A simple reference counter system is sufficient. In languages like Java and Perl it becomes harder, because they have pointers. However, the run-time system still knows which variables are pointers, so it is possible (although expensive) to determine for any block of memory whether it is in use or not. For C, this is generally not possible, because a variable may be a pointer or an integer and the run-time system cannot distinguish between them (in "normal" implementations - the C standard is sufficiently vague to allow "tagged" data types, and I think some C interpreters like Saber-C used them. Normal "compile to native machine code" implementations don't for the sake of run-time efficiency).

> To come back to the original:
>
> The PL/SQL interpreter takes advantage of the environment
> it's running in and knows about it. It doesn't have to
> run in any other environment. It can therefore be very
> efficient in its interaction with it.
>
> If Oracle wants to keep its JVM under the standard imposed by
> Sun (which they have to or they break the licence agreement),
> then they have to make it independent from the fact that it's
> running inside the database code. It will never therefore be
> as integrated with the database as PL/SQL can be.

Right. I already commented on this in
<slrnb9jkgf.sgj.hjp-usenet_at_teal.hjp.at>.

> Nor would it be desirable to introduce database dependencies
> into the JVM. For example, the db traffic must always go
> through the same mechanism of any other Java runtime: JDBC.
> Of course, Oracle could provide a special class for its
> own JVM to totally bypass JDBC. Doubt they will, as this
> would make any program written in its JVM non-portable
> and Sun wouldn't like that.

Oracle already introduced the oracle.* classes and the JSQL preprocessor. There is a good chance that a lot of Java code written for Oracle isn't portable. So far (AFAIK) Sun hasn't complained (they might have complained, if MS had done the same thing, though :-).

> This will always make it slower to interface SQL to Java than it
> is to interface SQL and PL/SQL. Hence the timings you see
> when calling Java functions.

I didn't see them, that was <pete_at_mynix.org>. I was just one of the people speculating why he saw them (and if you read <slrnb9jkgf.sgj.hjp-usenet_at_teal.hjp.at>, you will see that I agree that some of the performance difference is inevitable.

I just happen to disagree with you about the reason for the performance difference and on the definitions of the words "interpreter" and "virtual machine". The definitions of course are a purely semantic issue (if I call a table a cat, it will be confusing to people, but I can still put my cup of coffee on it), the reasons for the performance difference, however, have a practical relevance for programmers. If they know how the interface between the JVM and the SQL engine in Oracle looks like, they can decide what is fast and what is slow, and what will likely stay the same in the next version of Oracle and what might change.

-- 
   _  | Peter J. Holzer    | Latein ist das humanoide Äquivalent
|_|_) | Sysadmin WSR       | zu Fortran.
| |   | hjp_at_hjp.at         |
__/   | http://www.hjp.at/ |    -- Alexander Bartolich in at.linux

Received on Sat Apr 19 2003 - 07:03:42 CDT