RE: Standby hung following a network disconnect

From: Chandra Pabba <chandra_pabba_at_verizon.net>
Date: Sat, 03 Jul 2010 19:15:18 -0500
Message-id: <000601cb1b0d$fbda1790$f38e46b0$_at_net>



Vasu,  

What are the different settings/attributes (like: REOPEN, NET_TIMEOUT,MAX_FAILURE etc) you have currently defined for LOG_ARCHIVE_DEST_n pointing to standby?  

Thanks
Chandra

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Vasu Rajagopal
Sent: Friday, July 02, 2010 10:15 PM
To: oracle-l_at_freelists.org
Subject: Standby hung following a network disconnect    

Hi,  

I have a Data Guard issue , It is RAC Production on 10.2.0.4 using LGWR ASYNC redo transport to DR site which is also a RAC configuration (Physical Standby).

One of the standby database (in Real time apply mode) is hanging after NETWORK DISCONNECT error, causing few Gb/sec read I/O on Stanby Redo Logs (SRLs) and it seems to be stuck forever.

This has occurred almost twice a week in the last 2 months .  

As a temporary work-around, Cancelled the managed recovery process (MRP) and put it into ARCH apply mode, that seems to be working , though we would like to have the DR site

Running in REAL TIME APPLY mode.  

I have got an update from Oracle saying :

Most of the issues relating to ora-3135 have ended up being a router / switch / firewall / http protocol issue, asking , If cisco router is used then to disable the fixup protocol for the sqlnet port , etc.

However, I am not sure why the LNS/MRP processes are unable to recover and get back into normal mode after detecting the timeout .  

Looking for inputs on ways to diagnose/resolve this.

Thanks,

Vasu  

Here is the excerpt from log files showing disconnect :

Primary DB (mydb) alert log



Errors in file /u001/app/oracle/admin/mydb/bdump/mydb1_lns1_22694.trc: ORA-03135: connection lost contact
Fri Jun 25 16:00:42 2010
LGWR: I/O error 3135 archiving log 2 to
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.co
m)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_N AME=dgmydb2)(SERVER=dedicated)))'

Primary LNS Trace file --- mydb1_lns1_22694.trc



Sending online log thread 1 seq 14857 [logfile 2] to standby Archiving to destination
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.com )(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_NA ME=dgmydb2)(SERVER=dedicated))) ASYNC blocks=20480 Log file opened [logno 2]
*** 2010-06-25 16:00:42.157
RFS network connection lost at host
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.co
m)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_N AME=dgmydb2)(SERVER=dedicated)))'
Error 3135 writing standby archive log file at host
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=racsrvr11.mycomp.co
m)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=dgmydb_XPT.mycomp.com)(INSTANCE_N AME=dgmydb2)(SERVER=dedicated)))'
ORA-03135: connection lost contact
*** 2010-06-25 16:00:42.170 64208 kcrr.c

Standby alert log



Mem# 0: /netapp/oracle/dr/redologs11/dgmydb/group_32.1296.718556259 Fri Jun 25 16:00:42 2010
RFS[2]: Possible network disconnect with primary database Fri Jun 25 18:14:17 2010
Redo Shipping Client Connected as PUBLIC -- Connected User is Valid
RFS[6]: Assigned to RFS process 10755
RFS[6]: Identified database type as 'physical standby'    

Fiberlink Disclaimer: The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Jul 03 2010 - 19:15:18 CDT

Original text of this message