On fork's, wait's, SIGCHLD and Pro*C

From: Andy Edwards <andy_at_max.uis.com>
Date: Thu, 26 Aug 1993 03:57:41 GMT
Message-ID: <CCCLo6.Ev0_at_max.uis.com>


We seem to have run into a stumbling block with Oracle 7.0.12 and Pro*C 1.5.6(?) on HP9000 S800 with HPUX 9.0. Hope I remember all of the details....

We have a program that does the following:

  • opens several file descriptors (ttys) and stores them in a list.
  • infinitely does a select() on the file descriptors
  • when there is action on a tty, the program fork's a child to handle the activity. this child does some SQL and exits. in addition, the file descriptor is removed from the select() list so that subsequent activity is only seen in the child. finally, the PID of the child is also noted so that we know a child is working on that file descriptor.
  • in the meantime, the parent process goes back to the select() to look for more action. however, before it gets back to the select(), it does a CONNECT, followed by some SQL, and then a COMMIT RELEASE. Only then does it return to the select().
  • In order to get the file descriptor back in the select() list, the parent must know when the child exits, and find the file descriptor associated with the childs PID. this is done by catching SIGCHLD and setting a flag. the flag is then checked just before select() is called again. if set, waitpid() is called to find any PIDs that need to be looked at. for each PID, the associated file descriptor is put back in the file descriptor set for the select().

We are having a problem in that sometimes the PID associated with one of our children is never returned by waitpid(), therefore we never get the file descriptor back in the select().

I believe that the CONNECT, SQL, COMMIT RELEASE sequence interferes with our process handling scheme. the evidence is:

  1. I know a child process is fork'ed in that sequence because I am catching SIGCHLD immediately after that sequence even when I have no children (SIGCHLD is blocked while Oracle stuff is happening)
  2. I *never* see a stray PID from waitpid(). that is, i never see a PID that I did not fork().
  3. There are never zombies that need to be wait'ed on.

My hypothesis is that the CONNECT statement fork's a child to do SQL stuff. Then the RELEASE tells the child to exit, and then performs a wait() to clean up after itself. Unfortunately, the above conditions lead me to believe that RELEASE is, in fact, wait'ing for any and all children, thereby eating my PIDs as well as its own.

Can anyone confirm or refute my claim here? I will probably need someone who has seen the source or experienced the problem. If my claim holds, then this is a bug in the Pro*C library that needs to be fixed.

In the meantime, any suggestions for a workaround?

   thanks,

        -andy
        
Received on Thu Aug 26 1993 - 05:57:41 CEST

Original text of this message