Re: Zombie SQL*Net processes

From: Ian A. MacGregor <ian_at_jupiter.SLAC.Stanford.EDU>
Date: 14 Jul 92 15:34:36 GMT
Message-ID: <4640_at_unixhub.SLAC.Stanford.EDU>


In article <1992Jul10.154219.1702_at_edm.isac.CA>, steve_at_edm.isac.ca (Steve Hole) writes:
|> We have run across a problem with the orasrv (SQL*Net TCP/IP) server for
|> Sun OS (and possibly for other Unix based servers). Actually, it
|> probably isn't orasrv at all, but the ORACLE server process itself (for
|> those who don't know, orasrv will spawn an ORACLE SQL*Net server process
|> for each connection request that it receives).
|>
|> When a connection to a remote client - generally a PC - is broken
|> without going through the regular shutdown process, the server process
|> remains on the server machine. Eventually you get enough of these
|> things to cause a problem, especially if you have a rather small server.
|> As everyone knows, turning off or doing a reboot to solve a problem on a
|> PC is a very common occurence.
|>
|> Typically, for a TCP socket connection, once either the server or client
|> process detects that the connection is gone, then the process should
|> clean up and die pleasantly - perhaps issuing some reasonable error
|> message to a logfile. The first time a process tries to read or write
|> to the defunct TCP channel should result in the detection of the dead
|> connection.
|>
|> This leads me to suspect that the ORACLE server processes do not check
|> the connection on a regular basis to see if it is active or not. It
|> seems to me that this would be a function of the SQL*Net protocol
|> itself. Several other application level protocols support a "keepalive"
|> operation that checks the status of a link to see if it is still going.
|> TCP will immediately return an error on a down link.
|>
|> Questions:
|>
|> 1. Has anyone else encountered the problem, and if so, what did you do
|> about it?
|>
|> 2. Is there a fix available?
|>
|> 3. Are my guesses about the cause of the problem correct?
|>
|> For a short term fix, the problem is easy enough to solve by regularly
|> hunting and killing the offending processes - I have written a nasty
|> little script to do this. This offends my sense of correctness however,
|> and I would prefer the direct solution.

The problem you are stating is well known. It is restricted to clients which lack pre-emptive multitasking operating systems. You will not have this problem connecting two unix boxes, nor a unix and a VMS box together. The problem is, how can MS-DOS respond to a keep alive signal if it is busy processing another request. Oracle7 has a method of allocating system resources on the server. One of these resources is "idle time". I have not tried this method, but it appears by setting idle time to perhaps 60 minutes, the zombie process will go away after 60 minutes of continuous inactivity.

                          Ian MacGregor
                          Stanford Linear Accelerator Center
                          IAN_at_SLAC.STANFORD.EDU (415) 926-3528
Received on Tue Jul 14 1992 - 17:34:36 CEST

Original text of this message