RE: How to reproduce a hanging connect attempt

Date: Wed, 25 Sep 2002 11:48:29 -0800
OK, maybe adequate was the wrong word.

It *can* be done with ksh. But it's rather ungainly.

In Perl:

   my $dbh = '';
   eval {

      local $SIG{ALRM} = sub {  die   };
      alarm 60;

      my $dbh = DBI->connect(
         'dbi:Oracle:' . $db,

   alarm 0;

   unless ($dbh) {
     print "db $db is down";

In addition, there are a number of other things I do with my monitoring scripts that could probably be done in ksh, but Perl is a much better tool for the job. Notification of DBA's on a rotating schedule, optional notification of a manager.

Set hours of operation per database, don't page the DBA outside those hours, just send email.

Set hours to page immediately, outside of those hours don't page until a configurable number of attempts have been made.

etc, etc, etc.

Now I'm getting far afield, and I'll stop.

It is adequate.
Below is snip from Steve Adams's script ( and I successfully used similar technique for some time.
---------- snip ------------
rm -f $READY
print "

        connect nobody/really 
        host touch $READY 
        exit " | 

sqlplus /nolog > $SPOOL &
# wait for up to 59 seconds

((timeout = 60))
while ((timeout -= 1)) && [[ ! -r $READY ]] do

        sleep 1
# check for hang

[[ -r $READY ]] ||

        kill $! 
        msg="$PROGRAM: Oracle instance $ORACLE_SID is not responding" 
        $DEBUG logger -p oracle.err "$msg" 
        $INTERACTIVE $msg 

---------- snip ------------

Thanks for the info Ian.
I've been asked to prove why sqlplus and ksh are not adequate for checking connectity. The third possibility, a hang, is exactly that reason.
I'm trying to duplicate what can actually happen to cause a hanging connection. I've been burned by that in the past when my script didn't properly allow for hangs. Jared

Have you fooled with the CONNECT_TIMEOUT_<LISTENER> parameter of listener.ora? Setting it to 0 won't guarantee a connection will hang, but

 will tell a process to wait forever to connect. Hanging connections were

a problem for us with the earlier Oracle 6 releases. My solution was less

elegant. It used one program which attempted to connect, wrote a timestamp, and signaled if the connection failed ; another checked the timestamp against the current time and signaled if yhe difference was too great I cannot recall seeing the hanging problem for years, but we still run the program to check for it.
I've been stating that three things can happen on an Oracle connection attempt for years: it can be successful, it can fail, or it can hang and return nothing. Yet, 100% of the scripts I see which attempt to connect to the database to ensure it is functional do not consider the third possibility.
Seems with your upcoming article that percentage will drop to 99.9999. Ian MacGregor
Stanford Linear Accelerator Center
Dear List,
As an example for an article I'm working on, I'm showing how a hanging connect can be timed out in a Perl script via the alarm() call.
By 'hanging connect' I mean a connection attempt that never connects and never returns an error code. I have one right now on my Linux box. I started a database, did kill -9 on the oracle processes, and now attempts to login to the database hang. It's been that way for 24 hours now. e.g. sqlplus scott/tiger_at_ts98
. never returns an error code, never connects. Guess it isn't going to connect. This could be a problem in a ksh script written to check connectivity. ( which is why I use Perl )
The question is, why? What is a consistent way to reproduce this error? The method I used isn't consistent. This is something that I see happen from time to time on Oracle databases, both NT and Unix platforms, hence the reason for the timeout on the connect.
Any thoughts on how to consistently reproduce this, on either platform? Thanks,


