Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Listener problems on 9.2/Linux

Listener problems on 9.2/Linux

From: Tomislav Sagrak <tsagrak_at_opus.hr>
Date: Mon, 3 Mar 2003 17:52:42 +0100
Message-ID: <b40180$a0v$1@brown.net4u.hr>


We have a seriuos problem regarding one production site which has recently been
upgraded to SLES8 (2.4.19-233 smp kernel) and to 9.2.0.2 Let me mention that the database is running under dedicated server with about 150 users connected and a lot of
batch processes due to automated warehouse management system which are very short connections to the database.

Now we are facing the same problem which was on our last production installation but have no workaround.
On previous SUSE 7.3 installation with 9.2.0.1 we have offloaded the listener by creating additional 3 listeners and it helped.

.

Namely, listener hangs from time to time under load. The system accepts about 10.000 connection requests per day and the requests are spread among 4 listeners. Once the listener quits its work, nothing is to me visible in trace or log files, but the commang "ps -elf" gets the following output:

000 S oracle 7953 1 0 75 0 - 3379 pipe_w 05:00 ? 00:00:00 /opt/oracle/product/9ir2/bin/tnslsnr LISTENER1 -inherit 000 S oracle 7959 1 0 76 0 - 3378 schedu 05:00 ? 00:00:04 /opt/oracle/product/9ir2/bin/tnslsnr LISTENER3 -inherit 000 S oracle 20628 1 0 75 0 - 3385 schedu 10:53 pts/0 00:00:03 /opt/oracle/product/9ir2/bin/tnslsnr LISTENER2 -inherit 000 S oracle 24149 1 0 75 0 - 3379 schedu 11:32 pts/2 00:00:00 /opt/oracle/product/9ir2/bin/tnslsnr LISTENER -inherit

Note the WCHAN field in which the normal "running" listeners are stated as "schedu" and the died one has "pipe_w" stated. It seems that the process is waiting something from Linux (or ORACLE?) indefinitely.

Anyway, last few lines before the listener died in an trace file could help perhaps:

[03-MAR-2003 11:50:22:900] sntpcall: hdl[IR]=15, hdl[IW]=14
[03-MAR-2003 11:50:22:900] nsbeqaddr: doing connect handshake...
[03-MAR-2003 11:50:22:900] nsbequeath: doing connect handshake...
[03-MAR-2003 11:50:22:902] nsbequeath: NSE=0
[03-MAR-2003 11:50:22:902] nsbeqaddr: connect handshake is complete
[03-MAR-2003 11:50:22:902] nstimarmed: no timer allocated
[03-MAR-2003 11:50:22:902] nsclose: closing transport
[03-MAR-2003 11:50:22:902] nsclose: global context check-out (from slot
3) complete

[03-MAR-2003 11:50:22:976] nsopen: opening transport...
[03-MAR-2003 11:50:22:976] nttcnp: Validnode Table IN use; err 0x0
[03-MAR-2003 11:50:22:976] nttcnp: getting sockname
[03-MAR-2003 11:50:22:976] nttcnr: waiting to accept a connection.
[03-MAR-2003 11:50:22:976] nttcnr: getting sockname
[03-MAR-2003 11:50:22:976] nttcnr: connected on ipaddr 192.168.254.2
[03-MAR-2003 11:50:22:976] nttvlser: valid node check on incoming node
192.168.254.1

[03-MAR-2003 11:50:22:976] nttvlser: Accepted Entry: 192.168.254.1
[03-MAR-2003 11:50:22:977] nttcon: set TCP_NODELAY on 10
[03-MAR-2003 11:50:22:977] nsopen: transport is open
[03-MAR-2003 11:50:22:977] nsnainit: inf->nsinfflg[0]: 0xd
inf->nsinfflg[1]: 0xd

[03-MAR-2003 11:50:22:977] nsopen: global context check-in (to slot 3)
complete

[03-MAR-2003 11:50:22:977] nsanswer: deferring connect attempt; at stage
5

[03-MAR-2003 11:50:22:977] nscon: doing connect handshake...
[03-MAR-2003 11:50:22:977] nscon: got NSPTCN packet
[03-MAR-2003 11:50:22:977] nsevdansw: exit
[03-MAR-2003 11:50:22:978] nsbeqaddr: connecting...
[03-MAR-2003 11:50:22:979] sntpcall: About to exec
/opt/oracle/product/9ir2/bin/oracle

[03-MAR-2003 11:50:22:994] sntpcall: detaching from parent with
additional fork

[03-MAR-2003 11:50:22:994] sntpcall: result string is NTP0 25272

[03-MAR-2003 11:50:22:994] sntpcall: hdl[IR]=15, hdl[IW]=14
[03-MAR-2003 11:50:22:994] nsbeqaddr: doing connect handshake...
[03-MAR-2003 11:50:22:994] nsbequeath: doing connect handshake...
    and it dies.... (no log or tracing info beyond this point)

By issuing LSNRCTL there is no response from the "died" listener, only "kill -9" and restarting solves the problem.

It seems that the problem appears to rise in frequency after several uptime days of the system - I am not sure but the first 24 hours this problem has not arrived (thursday) and on friday it happened once - and today it has happened about 6 times.

It is not easy to get the problem solved on metalink and if anyone has an idea, please help since it has impact on users.

Thanks to all in advance,

Tom Sagrak Received on Mon Mar 03 2003 - 10:52:42 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US