Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: How to reproduce a hanging connect attempt

RE: How to reproduce a hanging connect attempt

From: <Alexander.Feinstein_at_mitchell1.com>
Date: Tue, 24 Sep 2002 16:23:25 -0800
Message-ID: <F001.004D86AD.20020924162325@fatcity.com>


Content-Type: multipart/alternative;
 boundary="----_=_NextPart_001_01C26421.45D4AF60"
------_=_NextPart_001_01C26421.45D4AF60

Content-Type: text/plain

Jared,

It is adequate.
Below is snip from Steve Adams's script (db_check.sh) and I successfully used similar technique for some time.

# wait for up to 59 seconds
#
((timeout = 60))
while ((timeout -= 1)) && [[ ! -r $READY ]] do

        sleep 1
done

# check for hang
#
[[ -r $READY ]] ||
{

	kill $!
	msg="$PROGRAM: Oracle instance $ORACLE_SID is not responding"
	$DEBUG logger -p oracle.err "$msg"
	STATUS=1
	$INTERACTIVE $msg
	continue

}
---------- snip ------------

Alex.

-----Original Message-----
Sent: Tuesday, September 24, 2002 12:23 PM To: Multiple recipients of list ORACLE-L

Thanks for the info Ian.

I've been asked to prove why sqlplus and ksh are not adequate for checking connectity. The third possibility, a hang, is exactly that reason.

I'm trying to duplicate what can actually happen to cause a hanging connection. I've been burned by that in the past when my script didn't properly allow for hangs.

Jared

"MacGregor, Ian A." <ian_at_SLAC.Stanford.EDU> Sent by: root_at_fatcity.com
 09/24/2002 11:59 AM
 Please respond to ORACLE-L  

        To:     Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
        cc: 
        Subject:        RE: How to reproduce a hanging connect attempt


Have you fooled with the CONNECT_TIMEOUT_<LISTENER> parameter of listener.ora? Setting it to 0 won't guarantee a connection will hang, but  will tell a process to wait forever to connect. Hanging connections were a problem for us with the earlier Oracle 6 releases. My solution was less elegant. It used one program which attempted to connect, wrote a timestamp, and signaled if the connection failed ; another checked the timestamp against the current time and signaled if yhe difference was too great I cannot recall seeing the hanging problem for years, but we still run the program to check for it.

I've been stating that three things can happen on an Oracle connection attempt for years: it can be successful, it can fail, or it can hang and return nothing. Yet, 100% of the scripts I see which attempt to connect to the database to ensure it is functional do not consider the third possibility.

Seems with your upcoming article that percentage will drop to 99.9999.

Ian MacGregor
Stanford Linear Accelerator Center
ian_at_SLAC.Stanford.edu

-----Original Message-----
Sent: Monday, September 23, 2002 10:03 PM To: Multiple recipients of list ORACLE-L

Dear List,

As an example for an article I'm working on, I'm showing how a hanging connect can be timed out in a Perl script via the alarm() call.

By 'hanging connect' I mean a connection attempt that never connects and never returns an error code.

I have one right now on my Linux box. I started a database, did kill -9 on the oracle processes, and now attempts to login to the database hang. It's been that way for 24 hours now.

e.g. sqlplus scott/tiger_at_ts98

.. never returns an error code, never connects.

Guess it isn't going to connect. This could be a problem in a ksh script written to check connectivity. ( which is why I use Perl )

The question is, why? What is a consistent way to reproduce this error? The method I used isn't consistent.

This is something that I see happen from time to time on Oracle databases, both NT and Unix platforms, hence the reason for the timeout on the connect.

Any thoughts on how to consistently reproduce this, on either platform?

Thanks,

Jared

-- 


------_=_NextPart_001_01C26421.45D4AF60
Content-Type: text/html Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3DUS-ASCII"> <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version = 5.5.2653.12"> <TITLE>RE: How to reproduce a hanging connect attempt</TITLE> </HEAD> <BODY> <P><FONT SIZE=3D2>Jared,</FONT> </P> <P><FONT SIZE=3D2>It is adequate.</FONT> <BR><FONT SIZE=3D2>Below is snip from Steve Adams's script = (db_check.sh) and I successfully used similar technique for some = time.</FONT> </P> <P><FONT SIZE=3D2>---------- snip ------------</FONT> <BR><FONT SIZE=3D2>rm -f $READY</FONT> <BR><FONT SIZE=3D2>print &quot;</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>connect = nobody/really</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>host = touch $READY</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>exit = &quot; |</FONT> <BR><FONT SIZE=3D2>sqlplus /nolog &gt; $SPOOL &amp;</FONT> </P> <P><FONT SIZE=3D2># wait for up to 59 seconds</FONT> <BR><FONT SIZE=3D2>#</FONT> <BR><FONT SIZE=3D2>((timeout =3D 60))</FONT> <BR><FONT SIZE=3D2>while ((timeout -=3D 1)) &amp;&amp; [[ ! -r $READY = ]]</FONT> <BR><FONT SIZE=3D2>do</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>sleep = 1</FONT> <BR><FONT SIZE=3D2>done</FONT> </P> <P><FONT SIZE=3D2># check for hang</FONT> <BR><FONT SIZE=3D2>#</FONT> <BR><FONT SIZE=3D2>[[ -r $READY ]] ||</FONT> <BR><FONT SIZE=3D2>{</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>kill = $!</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT = SIZE=3D2>msg=3D&quot;$PROGRAM: Oracle instance $ORACLE_SID is not = responding&quot;</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>$DEBUG = logger -p oracle.err &quot;$msg&quot;</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT = SIZE=3D2>STATUS=3D1</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT = SIZE=3D2>$INTERACTIVE $msg</FONT> <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT = SIZE=3D2>continue</FONT> <BR><FONT SIZE=3D2>}</FONT> <BR><FONT SIZE=3D2>---------- snip ------------</FONT> </P> <P><FONT SIZE=3D2>Alex.</FONT> </P> <BR> <P><FONT SIZE=3D2>-----Original Message-----</FONT> <BR><FONT SIZE=3D2>From: Jared.Still_at_radisys.com [<A = HREF=3D"mailto:Jared.Still_at_radisys.com">mailto:Jared.Still_at_radisys.com</= A>]</FONT> <BR><FONT SIZE=3D2>Sent: Tuesday, September 24, 2002 12:23 PM</FONT> <BR><FONT SIZE=3D2>To: Multiple recipients of list ORACLE-L</FONT> <BR><FONT SIZE=3D2>Subject: RE: How to reproduce a hanging connect = attempt</FONT> </P> <BR> <P><FONT SIZE=3D2>Thanks for the info Ian.</FONT> </P> <P><FONT SIZE=3D2>I've been asked to prove why sqlplus and ksh are = not</FONT> <BR><FONT SIZE=3D2>adequate for checking connectity.&nbsp; The third = possibility,</FONT> <BR><FONT SIZE=3D2>a hang, is exactly that reason.</FONT> </P> <P><FONT SIZE=3D2>I'm trying to duplicate what can actually happen to = cause</FONT> <BR><FONT SIZE=3D2>a hanging connection.&nbsp; I've been burned by that = in the </FONT> <BR><FONT SIZE=3D2>past when my script didn't properly allow for = hangs.</FONT> </P> <P><FONT SIZE=3D2>Jared</FONT> </P> <BR> <BR> <BR> <BR> <P><FONT SIZE=3D2>&quot;MacGregor, Ian A.&quot; = &lt;ian_at_SLAC.Stanford.EDU&gt;</FONT> <BR><FONT SIZE=3D2>Sent by: root_at_fatcity.com</FONT> <BR><FONT SIZE=3D2>&nbsp;09/24/2002 11:59 AM</FONT> <BR><FONT SIZE=3D2>&nbsp;Please respond to ORACLE-L</FONT> </P> <P><FONT SIZE=3D2>&nbsp;</FONT> <BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = To:&nbsp;&nbsp;&nbsp;&nbsp; Multiple recipients of list ORACLE-L = &lt;ORACLE-L_at_fatcity.com&gt;</FONT> <BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cc: = </FONT> <BR><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Subject:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RE: How to reproduce = a hanging connect attempt</FONT> </P> <BR> <P><FONT SIZE=3D2>Have you fooled with the = CONNECT_TIMEOUT_&lt;LISTENER&gt; parameter of </FONT> <BR><FONT SIZE=3D2>listener.ora?&nbsp; Setting it to 0 won't guarantee = a connection will hang, but </FONT> <BR><FONT SIZE=3D2>&nbsp;will tell a process to wait forever to = connect.&nbsp; Hanging connections were </FONT> <BR><FONT SIZE=3D2>a problem for us with the earlier Oracle 6 = releases.&nbsp; My solution was less </FONT> <BR><FONT SIZE=3D2>elegant. It used one program which attempted to = connect, wrote a </FONT> <BR><FONT SIZE=3D2>timestamp, and signaled if the connection failed ; = another checked the </FONT> <BR><FONT SIZE=3D2>timestamp against the current time and signaled if = yhe difference was too </FONT> <BR><FONT SIZE=3D2>great&nbsp; I cannot recall seeing the hanging = problem for years, but we still </FONT> <BR><FONT SIZE=3D2>run the program to check for it.</FONT> </P> <P><FONT SIZE=3D2>I've been stating that three things can happen on an = Oracle connection </FONT> <BR><FONT SIZE=3D2>attempt for years:&nbsp; it can be successful, it = can fail, or it can hang and </FONT> <BR><FONT SIZE=3D2>return&nbsp; nothing.&nbsp; Yet, 100% of the scripts = I see which attempt to connect </FONT> <BR><FONT SIZE=3D2>to the database to ensure it is functional do not = consider the third </FONT> <BR><FONT SIZE=3D2>possibility.</FONT> </P> <P><FONT SIZE=3D2>Seems with your upcoming article that percentage will = drop to 99.9999. </FONT> </P> <P><FONT SIZE=3D2>Ian MacGregor</FONT> <BR><FONT SIZE=3D2>Stanford Linear Accelerator Center</FONT> <BR><FONT SIZE=3D2>ian_at_SLAC.Stanford.edu</FONT> </P> <P><FONT SIZE=3D2>-----Original Message-----</FONT> <BR><FONT SIZE=3D2>Sent: Monday, September 23, 2002 10:03 PM</FONT> <BR><FONT SIZE=3D2>To: Multiple recipients of list ORACLE-L</FONT> </P> <BR> <BR> <P><FONT SIZE=3D2>Dear List,</FONT> </P> <P><FONT SIZE=3D2>As an example for an article I'm working on, I'm = showing how</FONT> <BR><FONT SIZE=3D2>a hanging connect can be timed out in a Perl script = via the </FONT> <BR><FONT SIZE=3D2>alarm() call.</FONT> </P> <P><FONT SIZE=3D2>By 'hanging connect' I mean a connection attempt that = never</FONT> <BR><FONT SIZE=3D2>connects and never returns an error code.</FONT> </P> <P><FONT SIZE=3D2>I have one right now on my Linux box.&nbsp; I started = a database, did</FONT> <BR><FONT SIZE=3D2>kill -9 on the oracle processes, and now attempts to = login</FONT> <BR><FONT SIZE=3D2>to the database hang. It's been that way for 24 = hours now. </FONT> </P> <P><FONT SIZE=3D2>e.g. sqlplus scott/tiger_at_ts98</FONT> </P> <P><FONT SIZE=3D2>.. never returns an error code, never = connects.</FONT> </P> <P><FONT SIZE=3D2>Guess it isn't going to connect.&nbsp; This could be = a problem in a</FONT> <BR><FONT SIZE=3D2>ksh script written to check connectivity.&nbsp; ( = which is why I</FONT> <BR><FONT SIZE=3D2>use Perl )</FONT> </P> <P><FONT SIZE=3D2>The question is, why?&nbsp; What is a consistent way = to reproduce</FONT> <BR><FONT SIZE=3D2>this error?&nbsp; The method I used isn't = consistent.</FONT> </P> <P><FONT SIZE=3D2>This is something that I see happen from time to time = on Oracle</FONT> <BR><FONT SIZE=3D2>databases, both NT and Unix platforms, hence the = reason for </FONT> <BR><FONT SIZE=3D2>the timeout on the connect.</FONT> </P> <P><FONT SIZE=3D2>Any thoughts on how to consistently reproduce this, = on either platform?</FONT> </P> <P><FONT SIZE=3D2>Thanks,</FONT> </P> <P><FONT SIZE=3D2>Jared</FONT> </P> <P><FONT SIZE=3D2>-- </FONT> </P> </BODY> </HTML>
------_=_NextPart_001_01C26421.45D4AF60--
------=_NextPartTM-000-46b4d7b8-d003-11d6-984b-0008c79fc2b3-- -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: INET: Alexander.Feinstein_at_mitchell1.com Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Received on Tue Sep 24 2002 - 19:23:25 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US