Re: Client and Shadow Process stuck communicating in Network Layer

From: Patrick Jolliffe <jolliffe_at_gmail.com>
Date: Fri, 26 Jun 2015 14:07:06 +0800
Message-ID: <CABx0cSWEFHBcrYLBXpcuP5FrfBrUrikM_5z=kZc3fW0M64riKQ_at_mail.gmail.com>



Thanks for the pointers Stefan. I have checked, and this particular host is not using VIO.
Host OS is 6.1.8.15, and which believe contains fixes for all three. IV31011 is particularly interesting as connection IS loopback.

We are actually have slightly similar, but different problem right now with a DB Link from Linux to AIX.
I can see AIX side is stuck in write, Linux is stuck in read. (Note this AIX host DOES use VIO)
Interesting to note from "netstat -an" command mentioned in one of the links you sent, I can see the data is sitting in "Send Q" buffer. To me proves that this is OS Level/Networking issue, rather than application/DB level.
At least that is some evidence I can use to direct our efforts, Regards
Patrick

On 25 June 2015 at 19:14, Stefan Koehler <contact_at_soocs.de> wrote:

> Hi Patrick,
> "OS is AIX 6.1" and the call stack of the Oracle shadow process in read()
> gets my attention as some of my clients hit this issue several times :)
>
> Do you use VIO by chance? If yes (and even if not, you still can hit the
> LPAR OS issue - see last APAR), then you may have hit a nice "well-known" OS
> bug, which is/was pushed through varios AIX / VIO levels.
>
> http://www-01.ibm.com/support/docview.wss?uid=isg1IZ59298
> http://www-01.ibm.com/support/docview.wss?uid=isg1IZ96155
> https://www-304.ibm.com/support/docview.wss?uid=isg1IV20656
>
> Please don't be confused by the AIX version level. If you follow the
> sysrouted APARs, you will find the fix for AIX 6.1 as well.
>
> Best Regards
> Stefan Koehler
>
> Freelance Oracle performance consultant and researcher
> Homepage: http://www.soocs.de
> Twitter: _at_OracleSK
>
>
> > Patrick Jolliffe <jolliffe_at_gmail.com> hat am 25. Juni 2015 um 12:57
> geschrieben:
> >
> > We have been getting a very occasional problem with third party
> application, where client process and Oracle shadow process seem to hang
> both
> > waiting on read in Oracle network layer.
> > Database is 11.2.0.4, and OS is AIX 6.1, although this has persisted
> through various versions of database (and application).
> > Interesting that always seems to be within similar call-stack on
> application side (BatchReviseOnExit).
> > Leads me to suspect may be some kind of memory corruption on the client
> application side, but nothing I can identify from the source we have access
> > to.
> > Have spend some time with support, but was just getting bounced between
> application side and database side (even though sides within Oracle,
> > application is JDEdwards).
> > Wonder if anybody has any ideas about what I can do either
> pre-emptively to gather more information when it happens, or when
> application gets into
> > this state.
> > I have pasted output from v$session and process stacks below.
> > Patrick
> >
> > select process, spid, state, status, event, paddr From v$session s,
> v$process p where p.addr = s.paddr and sid = 982
> >
> > 55575914/28050178/WAITING/INACTIVE/SQL*Net message from
> client/07000106C51BC9B0
> >
> > [oracle_at_jdelogichk:/home/oracle]$ procstack 28050178
> > 28050178: oracleJDE (LOCAL=NO)
> > 0x090000000002dc94 read(??, ??, ??) + 0x274
> > 0x00000001009e63d4 ntusfprd(0x57b, 0x110a0cf16, 0xfffffffffff9050,
> 0x2822484100000020, 0x1003c5bb0) + 0x54
> > 0x0000000100a7d01c nsbasic_brc(??, ??, ??, ??) + 0x45c
> > 0x0000000100a7f3e0 nsbrecv(??, ??, ??, ??) + 0x80
> > 0x00000001018ae3c8 nioqrc(??, ??, ??, ??, ??) + 0x4448
> > 0x0000000108a80228 opikndf2(??, ??, ??, ??) + 0x7e8
> > 0x0000000108a528f8 opitsk(??, ??) + 0x318
> > 0x0000000108a820cc opiino(??, ??, ??) + 0x3ac
> > 0x0000000108a559ac opiodr(??, ??, ??, ??) + 0x38c
> > 0x0000000108a959ec opidrv(??, ??, ??) + 0x46c
> > 0x0000000108a8b4c8 sou2o(??, ??, ??, ??) + 0x88
> > 0x0000000100000a10 opimai_real(??, ??) + 0x230
> > 0x00000001000f7494 ssthrdmain(??, ??) + 0x114
> > 0x000000010000064c main(??, ??) + 0xcc
> > 0x0000000100000340 _text() + 0x70
> >
> > [jdespxx_at_jdelogichk:/home/jdespxx]$ procstack 55575914
> > 55575914: jdenet_k 6209
> > 0xd0121548 read(??, ??, ??) + 0x268
> > 0xd6ab3ff0 ntusfprd(??, ??, ??, ??, ??) + 0x50
> > 0xd6b4ae50 nsbasic_brc(??, ??, ??, ??) + 0x550
> > 0xd6b4cee0 nsbrecv(??, ??, ??, ??) + 0xa0
> > 0xd77bdf5c nioqrc(??, ??, ??, ??, ??) + 0x1bbc
> > 0xd65c6908 ttcdrv(??, ??) + 0x408
> > 0xd77dab2c nioqwa(??, ??, ??, ??, ??, ??) + 0x4c
> > 0xd66066a0 upirtrc(0x6, 0x24262888, 0x24ed1e5c, 0x24ed1f7c,
> 0x24ed2cbc, 0xf1eb2d74, 0x24ed34bc, 0x24260d50) + 0x740
> > 0xd6dd7f28 kpurcsc(??, ??, ??, ??, ??, ??, ??, ??) + 0x68
> > 0xd6e69f88 kpuexec(??, ??, ??, ??, ??, ??, ??, ??) + 0x2388
> > 0xd5e6b618 OCIStmtExecute(??, ??, ??, ??, ??, ??, ??, ??) + 0x18
> > 0xd79e7118 BFOCIStmtExecute(0x23c0ffa4, 0x24260c50, 0x24262888, 0x1,
> 0x0, 0x0, 0x0, 0x20) + 0x4c
> > 0xd79fd7b0 performRequestInternal(0x25033f38, 0x1) + 0x110
> > 0xd79fdd68 dballPerformRequest(0x25033f38) + 0xfc
> > 0xd79fddc4 DBPerformRequest(0x25033f38) + 0x14
> > 0xd3ad1ee4 JDB_DBPerformRequest(0x21ffbfa8, 0x25033f38, 0x25052798) +
> 0x40
> > 0xd3d57b54 TM_DBPerformRequest(0x20ffad18, 0x20ff4cb8, 0x25052798,
> 0x24a3da38) + 0x290
> > 0xd3ab2b64 DeleteTable(0x24a3da38, 0x2ff206bc, 0x0, 0x1, 0x2ff206dc,
> 0x20002, 0x1, 0x0) + 0x184c
> > 0xd3ab3e2c JDB_DeleteTable(0x24a3da38, 0x2ff206bc, 0x0, 0x1,
> 0x2ff206dc, 0x20002) + 0xb0
> > 0x206ccc74 BatchReviseOnExit() + 0x7f4
> > 0xd1335ae4 jdeCallObject(0x2228fe78, 0x0, 0x2426be08, 0x242e14c8,
> 0x24f784d8, 0x0, 0x0, 0x2228feb8) + 0x2420
> > 0xd3b750b4 JDEK_ProcessCallRequest(0x190bd1, 0xac1508fc, 0x0,
> 0x2228fc58, 0x23676c28, 0x20ff4cb8) + 0xce4
> > 0xd3b75b64 JDEK_StartCallRequest(0x190bd1, 0xac1508fc, 0x0,
> 0x2228fc58) + 0x46c
> > 0xd3b5a9a8 JDEK_DispatchCallObjectMessage(0x190bd1, 0xac1508fc, 0x0,
> 0x2228fc58, 0x0, 0x3850385, 0x0) + 0x4cc
> > 0xd7f48bb4 XMLCallObjectDispatch() + 0xd0
> > 0xd11c03ac callDispatchFunction(0x5, 0x190bd1, 0xac1508fc, 0x0,
> 0x2228fc58, 0x0, 0x385, 0x8000) + 0x52c
> > 0xd11c05f0 kernelMsgThread(0x2228fc18) + 0x1c0
> > 0xd11c200c processKernelQueueMsg(0x2228fc18) + 0x14
> > 0xd11b0730 processKernelQueue() + 0x49c
> > 0xd11a5798 JDENET_RunKernel(0x2ff22621) + 0x188
> > 0x10001f70 main(0x2, 0x2ff22568) + 0x290
> > 0x100001c0 __start() + 0x98
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jun 26 2015 - 08:07:06 CEST

Original text of this message