Re: RMAN duplicate is failing : database session for channel <channel_name> terminated unexpectedly

From: De DBA <dedba_at_tpg.com.au>
Date: Wed, 21 Mar 2012 10:57:21 +1000
Message-ID: <4F692771.5000503_at_tpg.com.au>



Hi Ashoke,

The logs that you mention are the Oracle Database Extension logs. The media manager logs that I meant are in <Netbackup_Root>/volmgr/debug. This article: http://www.symantec.com/business/support/index?page=content&id=TECH31097 has a list of location & process names that may be helpful.

If the lines you show under b) are the last in the file though, there does not seem to be a problem with the mount. These lines merely indicate that the backup piece was restored. As you can see in article TECH53002, if the media manager encounters an error it will be written below the "Error=7504" line.

The client read timeout that is mentioned is another property, unrelated to media mount timeout, which you can set (in V6.0) on the server or the client side. It defaults to 5 minutes, which the manual states is too short for the database extension. The client will use the local value if it does not receive a value from the server - as is the case in your situation: the log shows that no client read timeout is set. It seems to me that the size of your restore is the issue here, which may lead to (very) long waiting periods between reads as Oracle is restoring the piece just read.

The oracle error ( ORA-7445 ... [SIGSEGV] [Address not mapped to object] ...) seems to indicate that some object that used to be there (perhaps a TCP socket or another process) no longer exists, e.g. process exited on a timeout or socket closed. Other points to look at would include timeouts on TCP connections (firewalls perhaps?) and OS errors on the database host that may have caused the NB client to exit (message log, syslog, core dumps).

Hope this helps
Tony

On 21/03/12 00:27, Mandal, Ashoke wrote:
> Hi Tony,
>
> Here is the info on NetBackup Media Manager's logs
> a) The log under /usr/openv/netbackup/logs/user_ops/dbext/logs directory shows the following:
> 09:10:11 (197975.001) INF - Beginning restore from server phx00bs2 to client phx00apt1.
> 09:50:35 (197975.001) Status of restore from copy 1 of image created Mon Feb 27 19:09:01 2012 = the restore failed to recover the requested
>
> b) The log under /usr/openv/netbackup/logs/dbclient directory shows the following:
> 09:59:40.758 [6456]<4> VxBSASetEnv: INF - entering SetEnv - NBBSA_CLIENT_READ_TIMEOUT
> 09:59:40.758 [6456]<4> VxBSAGetEnv: INF - entering GetEnv - NBBSA_CLIENT_READ_TIMEOUT
> 09:59:40.758 [6456]<4> VxBSAGetEnv: INF - returning - 10800
> 09:59:40.758 [6456]<4> dbc_SetClientReadTimeout: INF - sending client read timeout
> 09:59:40.758 [6456]<2> xbsa_SetEnv: INF - leaving (0)
> 09:59:40.758 [6456]<8> int_ReadData: WRN - Failed to set client read timeout.
> 09:59:40.759 [6456]<2> sbterror: INF - entering
> 09:59:40.759 [6456]<2> sbterror: INF - Error=7504: Got end-of-file
>
> d) /usr/openv/netbackup/logs/bphdb directory didn't have any log.
>
> e) When I googled with "WRN - Failed to set client read timeout" I found the Article TECH73065 and Article: TECH53002 from Symantec site and these suggests me to verify the media Mount Timeout. Our storage administrator verified that it was set to unlimited.
> <phx00bs2><root>bpconfig -U | grep -i mount
> Media Mount Timeout: 0 minutes (unlimited)
> Shared Media Mount Timeout:0 minutes (unlimited)
>
> Let me know if any other are I should look at.
>
> Thanks,
> Ashoke
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of De DBA
> Sent: Tuesday, March 20, 2012 5:47 AM
> To: oracle-l_at_freelists.org
> Subject: Re: RMAN duplicate is failing : database session for channel<channel_name> terminated unexpectedly
>
> Reposted due to overquoting..
>
> Did you check the NetBackup Media Manager's logs? Perhaps it is trying to read from a tape that is not (no longer) mounted? Those logs should be on the NetBackup Media server, not necessarily on the database host (depending on your setup, of course).
>
> Cheers,
> Tony
>
>> On 20/03/12 14:02, Mandal, Ashoke wrote:
>>> I noticed that it generates the following error in alert log:
>>> Errors in file /phx11dbt1/u01/app/oracle/admin/vtwdmas/udump/vtwdmas_ora_6462.trc:
>>> ORA-07445: exception encountered: core dump [VxBSAGetData()+716]
>>> [SIGSEGV] [Address not mapped to object] [0x000000DF8] [] []
>>>
>>> The tracefile has the following message but the sbtio.log doesn't have any information as the size of sbtio.log is 0.
>>> SKGFQ OSD: Error in function sbtread2 on line 1156 SKGFQ OSD: Look
>>> for SBT Trace messages in file
>>> /phx11dbt1/u01/app/oracle/admin/vtwdmas/udump/sbtio.log
>>> Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to
>>> object), addr: 0xdf8, PC: [0xffffffff7d736d94, VxBSAGetData()+716]
>>>
>>> Couldn't locate any note in Metalink related to this error. Any suggestions will be appreciated.
>>>
>>> Thanks,
>>> Ashoke
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Mar 20 2012 - 19:57:21 CDT

Original text of this message