Re: Client sqlplus SELECT + SPOOL knocks down server network

From: Carlos <miotromailcarlos_at_netscape.net>
Date: Tue, 11 Mar 2008 01:43:03 -0700 (PDT)
Message-ID: <a1b596a3-ab4f-436f-88c0-133797f82fc4@n77g2000hse.googlegroups.com>


On 10 mar, 18:27, DA Morgan <damor..._at_psoug.org> wrote:
> Carlos wrote:
> > On 10 mar, 14:08, DA Morgan <damor..._at_psoug.org> wrote:
> >> Carlos wrote:
> >>> On 10 mar, 13:21, DA Morgan <damor..._at_psoug.org> wrote:
> >>>> Carlos wrote:
> >>>>> Hello all:
> >>>>> I'm facing a strange problem on a 10.2.0.3 Oracle on Oracle Enterprise
> >>>>> Linux 5 server.
> >>>>> This morning someone was issuing a SELECT from a windows client via
> >>>>> ODBC and suddenly all the server communications broke down: even no
> >>>>> ping from or to that server worked.
> >>>>> Tried stop&start network but no avail. The server appears to be dead
> >>>>> at communications until reboot.
> >>>>> So I tried to repeat the SELECT from sqlplus on a windows client to a
> >>>>> OS file with SPOOL and all the communications on the server are gone
> >>>>> once again. Ping only answers to 127.0.0.1 or its own IP address, and
> >>>>> Linux is telling me that the network card is OK. Stop&start network
> >>>>> didn't fix the problem either.
> >>>>> Client sqlnet.log shows:
> >>>>> Fatal NI connect error 12170.
> >>>>> VERSION INFORMATION:
> >>>>> TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
> >>>>> Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version
> >>>>> 10.2.0.1.0 - Production
> >>>>> Time: 10-MAR-2008 12:11:59
> >>>>> Tracing not turned on.
> >>>>> Tns error struct:
> >>>>> ns main err code: 12535
> >>>>> TNS-12535: Message 12535 not found; No message file for
> >>>>> product=NETWORK, facility=TNS
> >>>>> ns secondary err code: 12560
> >>>>> nt main err code: 505
> >>>>> TNS-00505: Message 505 not found; No message file for
> >>>>> product=NETWORK, facility=TNS
> >>>>> nt secondary err code: 60
> >>>>> nt OS err code: 0
> >>>>> Client address: <unknown>
> >>>>> The select is a two column (varchar2(9), char(1)) select of 11 million
> >>>>> rows aprox.
> >>>>> Could anybody tell me where to search next? (I didn't find anything at
> >>>>> metalink).
> >>>>> TIA.
> >>>>> Cheers.
> >>>>> Carlos.
> >>>> Some of what you've written seems self-conflicting.
> >>>> You start off telling us you've 10.2.0.3 and then everything
> >>>> thereafter indicates 10.2.0.1.
> >>>> Then you say "dead until reboot." Does that mean it is now
> >>>> working fine or not?
> >>>> How about queries using Oracle's native interface SQLNET
> >>>> rather than ODBC?
> >>>> Ping only answers to 127.0.0.1, I presume on that machine,
> >>>> have you verified that other addresses exist?
> >>>> Have you performed an ipconfig /renew?
> >>>> Perform an ipconfig /all ... what does it show?
> >>>> Are you using DHCP?
> >>>> Can you ping the server from elsewhere on the network?
> >>>> What are your firewall settings?
> >>>> What tool is connecting using ODBC?
> >>>> --
> >>>> Daniel A. Morgan
> >>>> Oracle Ace Director & Instructor
> >>>> University of Washington
> >>>> damor..._at_x.washington.edu (replace x with u to respond)
> >>>> Puget Sound Oracle Users Groupwww.psoug.org
> >>> Hello Daniel:
> >>>> You start off telling us you've 10.2.0.3 and then everything
> >>>> thereafter indicates 10.2.0.1.
> >>> Yup: 10.2.0.3 is the server on EL5. 10.2.0.1 was the windows client
> >>> from which I did the test.
> >>>> Then you say "dead until reboot." Does that mean it is now
> >>>> working fine or not?
> >>> Working fine after reboot... until I issue the SELECT again.
> >>>> How about queries using Oracle's native interface SQLNET
> >>>> rather than ODBC?
> >>> This is what I did on the windows 10.2.0.1 client with sqlplus.
> >>>> Ping only answers to 127.0.0.1, I presume on that machine,
> >>>> have you verified that other addresses exist?
> >>> Yes. If I ping the server network IP address from the server itself,
> >>> it answers OK, but no answer from any other IP in the network. If I
> >>> ping the server from another machine there is no answer.
> >>>>> Are you using DHCP?
> >>> Nope.
> >>>> Can you ping the server from elsewhere on the network?
> >>> Nope
> >>>> What are your firewall settings?
> >>> No firewall. Isolated environment.
> >>>> What tool is connecting using ODBC?
> >>> Visual FoxPro. But, as said before, sqlplus bring the server
> >>> communications to its knees too.
> >>> I'll try the ipconfig (after repeating the whole process from startup
> >>> the server again).
> >>> Thank you very much.
> >>> Cheers.
> >>> Carlos.
> >> A few additional thoughts.

>

> >> 1. Upgrade the client to 10.2.0.3
> >> 2. Post the SQL statement: I'd like to see it
> >> 3. Capture the statement and see if it contains any non-visible
> >> characters when Fox sends it through
> >> 4. What ODBC driver? Try changing drivers
>

> >> Is it only this one SQL statement or can more than one statement
> >> do it? What happens if you take the problem statement and
> >> reconstruct it from scratch in SQL*Plus starting with the
> >> simplest SELECT single_column FROM single_table and then
> >> incrementally add back its full functionality? When does it
> >> break?
> >> --
> >> Daniel A. Morgan
> >> Oracle Ace Director & Instructor
> >> University of Washington
> >> damor..._at_x.washington.edu (replace x with u to respond)
> >> Puget Sound Oracle Users Groupwww.psoug.org
>

> > Daniel:
>

> > I am doing the test from sqlplus (no ODBC this time)
>

> >> 2. Post the SQL statement: I'd like to see it
>

> > select /*+ full(a) */
> > a.col1,
> > a.col2
> > from schema.table_name a;
>

> > It starts displaying/spooling and then breaks with:
>

> > ...
> > 955800207 T
> > ERROR:
> > ORA-03135: la conexión ha perdido contacto
>

> > 10078290 filas seleccionadas.
>

> > ERROR:
> > ORA-03114: no conectado a ORACLE
>

> > (There are 11339771 rows in the table).
>

> > From here on, there is no ping from or to the server.
>

> > ifconfig on the server shows:
>

> > RX packets:674109 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:674109 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:53271778 (50.8 MiB) TX bytes:259126296 (247.1 MiB)
> > Interrupt:169
>

> > And it's time to reboot again...
>

> > Cheers.
>

> > Carlos.
>

> Aha:
>

> https://metalink.oracle.com/metalink/plsql/f?p=130:14:114792536629517...
>

> The title is: Subject: ORA-03113 Or ORA-03135 Reported for Failover
> with Network Outage
>

> The doc id is: Note:455771.1
>

> The date is: 10-MAR-2008
>

> That's today.
>

> The bug number is:
> Bug 5758870 Select Type TAF fails with ORA-3113 If Network Cable Pulled off
>

> The solution is:
> Problem is fixed from 11g onwards and in 10.2.0.4 Patch set release.
> One off patchs for some platforms are available.
> Apply one-off Patch 5758870 for your platform, if available.
>

> Are you on a RAC cluster?
> --
> Daniel A. Morgan
> Oracle Ace Director & Instructor
> University of Washington
> damor..._at_x.washington.edu (replace x with u to respond)
> Puget Sound Oracle Users Groupwww.psoug.org

Daniel:

First of all: thank you very much for your attention and time.

> Are you on a RAC cluster?

Nope. Single Instance.

I've taken a look at the metalink note that you kindly pointed, but no TAF here.

I'm beggining to suspect of a faulty network card. I've noticed that the problem occurs when there is a long information transfer from server to client, but not based on size, but on the time the transfer takes. I checked with FTP and big files, but it takes not too much time to transfer (the network card and switch are 1000Mbps) and the error does not arise. But when one sqlplus session takes too long to pull out a query the error appears in a random fashion: sometimes at 10M rows, sometimes at 7M rows, even at 700000 rows.

The server itself works OK as far as its 'local' network activity goes: the listener is up & running and the enterprise manager works too (from a local browser). The same query from a local sqlplus (with listener connection, no bequeath) works too.

Michael:

>>This can be tested by moving the system cable to a different switch port

Yes. I thought of that too (switch issues). But the network boy swears everything is OK. I'll try to convince him to test on other switch.

Thank you all for the suggestions and hints.

Cheers.

Carlos. Received on Tue Mar 11 2008 - 03:43:03 CDT

Original text of this message