Pro*C timeout

From: Mark N <google_at_nevars.com>
Date: Mon, 27 Aug 2007 15:02:44 -0700
Message-ID: <1188252164.965234.327030@57g2000hsv.googlegroups.com>

I have a UNIX executable that connects to a local database. Some of the queries must be done on another UNIX machine. This is accomplished with a call to a stored procedure that has a synonym defined with a database link. When the network behaves, all is good. Occasionally the network has some issues and access to the remote database "hangs" the executable. This executable receives and sends messages to other executables in the overall application. When enough time goes by, it can cause the application to enter a race condition as message queues fill up.

This was the state about 8 years ago when this call to a remote stored procedure was implemented. We were using Oracle 7 at the time. After the application was released to the field we started getting intermittent lockups and we traced it to these remote database calls. At the time I could find no way to set a timeout for a query. I worked around the problem with an alarm signal. When I get the alarm, I call longjump() to send control back to a starting point (the setjmp origination point). The longjump seemd to leave the oracle server process in a strange state as any further db access calls (either to the local or remote database) return "not connected." So I first disconnect, then reconnect and continue on. This solution, while a bit ugly, worked fine through Oracle 8.0.6, 8i, and 9i. We had to make one adjustment. Our customers did not like the side effect of stranded oracle processes. It seems the server process that was longjumped out of, remains as a UNIX process even with the disconnect. Not that big a deal except it prevents a shutdown immediate from completing. In response to that customer objection, we now query Oracle (at connection time) for the PID of the server process and we issue a system kill command after the longjump and before the reconnect. More ugliness, but it worked.

Now that we have upgraded to 10g, the longjump leaves the executable is a strange state. Left as is (longjump followed by system kill command followed by disconnect/connect), the executable dumps core during the disconnect. If I remove the kill command, the executable hangs for a few minutes, then inexplicably continues with the new connection. The old server process goes away with the disconnect. I should mention that we reproduce this in our lab by putting the 2 machines on a private LAN and introduce errors and delays into the network with a PC tool. This tool is about 99.9% effective. Every once in a while a query makes it through. This leads me to believe the long delay is really disconnect hanging until it can get through.

Anyway, I'm looking for a different way to break out of a long running query. Anything you can suggest will be wonderful.

Thanks. Received on Mon Aug 27 2007 - 17:02:44 CDT