Re: Client sqlplus SELECT + SPOOL knocks down server network

From: Michael Austin <maustin_at_firstdbasource.com>
Date: Tue, 11 Mar 2008 04:24:57 GMT
Message-ID: <tQnBj.13748$5K1.8337@newssvr12.news.prodigy.net>


DA Morgan wrote:

> Charles Hooper wrote:

>> On Mar 10, 7:51 am, Carlos <miotromailcar..._at_netscape.net> wrote:
>>> Hello all:
>>>
>>> I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise
>>> Linux 5 server.
>>>
>>> This morning someone was issuing a SELECT from a windows client via
>>> ODBC and suddenly all the server communications broke down: even no
>>> ping from or to that server worked.
>>>
>>> Tried stop&start network but no avail. The server appears to be dead
>>> at communications until reboot.
>>>
>>> So I tried to repeat the SELECT from sqlplus on a windows client to a
>>> OS file with SPOOL and all the communications on the server are gone
>>> once again. Ping only answers to 127.0.0.1 or its own IP address, and
>>> Linux is telling me that the network card is OK. Stop&start network
>>> didn't fix the problem either.
>>>
>>> Client sqlnet.log shows:
>>>
>>> Fatal NI connect error 12170.
>>>
>>> VERSION INFORMATION:
>>> TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
>>> Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows:
>>> Version
>>> 10.2.0.1.0 - Production
>>> Time: 10-MAR-2008 12:11:59
>>> Tracing not turned on.
>>> Tns error struct:
>>> ns main err code: 12535
>>> TNS-12535: Message 12535 not found; No message file for
>>> product=NETWORK, facility=TNS
>>> ns secondary err code: 12560
>>> nt main err code: 505
>>> TNS-00505: Message 505 not found; No message file for
>>> product=NETWORK, facility=TNS
>>> nt secondary err code: 60
>>> nt OS err code: 0
>>> Client address: <unknown>
>>>
>>> The select is a two column (varchar2(9), char(1)) select of 11 million
>>> rows aprox.
>>>
>>> Could anybody tell me where to search next? (I didn't find anything at
>>> metalink).
>>>
>>> TIA.
>>>
>>> Cheers.
>>>
>>> Carlos.
>>
>> If it does not cause too many issues dealing with another lock-up, you
>> might try enabling a 10046 trace at level 12 and a simultaneous 10053
>> trace at level 1 for the SQL*Plus session, and then execute the SQL
>> statememt. The 10053 trace should output the expected plan for the
>> SQL statement, including the expected number of rows for each
>> operation. If the session crashes, the data generated by 10046 trace
>> portion of the file should provide information indicating why the
>> session crashed.
>>
>> It might be possible that the table/index stats are suggesting that
>> Oracle should use a merge Cartesian join, when it should not, which
>> causes excessive temp tablespace usage, which causes the server's hard
>> drives to fill to capacity, which causes... It could also be a
>> problem with having CURSOR_SHARING set to FORCE.
>>
>> Have you looked for trace files in the udump, bdump, and cdump
>> directories on the server? Also, have you looked at the alert log?
>>
>> Charles Hooper
>> IT Manager/Oracle DBA
>> K&M Machine-Fabricating, Inc.
> 
> I can see everything you listed as a potential issue but I'm not sure
> how any of them could whack an IP address.

What kind router/hub/switch are you using? The symptoms seem familiar.   Rebooting the server can cycle the port (as if disconnected) and all is well again... Also check your Linux variant to see if there are any data-overrun type of bugs reported.

This can be tested by moving the system cable to a different switch port (if it can be done so easily (ie no VLAN definitions etc...)) Received on Mon Mar 10 2008 - 23:24:57 CDT

Original text of this message