Zombie SQL*Net processes

From: Steve Hole <steve_at_edm.isac.ca>
Date: Fri, 10 Jul 92 15:42:19 GMT
Message-ID: <1992Jul10.154219.1702_at_edm.isac.CA>


We have run across a problem with the orasrv (SQL*Net TCP/IP) server for Sun OS (and possibly for other Unix based servers). Actually, it probably isn't orasrv at all, but the ORACLE server process itself (for those who don't know, orasrv will spawn an ORACLE SQL*Net server process for each connection request that it receives).

When a connection to a remote client - generally a PC - is broken without going through the regular shutdown process, the server process remains on the server machine. Eventually you get enough of these things to cause a problem, especially if you have a rather small server. As everyone knows, turning off or doing a reboot to solve a problem on a PC is a very common occurence.

Typically, for a TCP socket connection, once either the server or client process detects that the connection is gone, then the process should clean up and die pleasantly - perhaps issuing some reasonable error message to a logfile. The first time a process tries to read or write to the defunct TCP channel should result in the detection of the dead connection.

This leads me to suspect that the ORACLE server processes do not check the connection on a regular basis to see if it is active or not. It seems to me that this would be a function of the SQL*Net protocol itself. Several other application level protocols support a "keepalive" operation that checks the status of a link to see if it is still going. TCP will immediately return an error on a down link.

Questions:

  1. Has anyone else encountered the problem, and if so, what did you do about it?
  2. Is there a fix available?
  3. Are my guesses about the cause of the problem correct?

For a short term fix, the problem is easy enough to solve by regularly hunting and killing the offending processes - I have written a nasty little script to do this. This offends my sense of correctness however, and I would prefer the direct solution. Received on Fri Jul 10 1992 - 17:42:19 CEST

Original text of this message