Problem with Oracle process

From: Miguel Angel Toribios <matf_at_tid.es>
Date: Tue, 06 Mar 2007 09:15:08 +0100
Message-ID: <esj7vf$aa75@news.hi.inet>

Hi,
I have a monothread C++ process running on Solaris 2.8 and Oracle 8.1.6. Access to BD is implemented via ProC:
EXEC SQL CONNECT :pc_nombre_fi
IDENTIFIED BY :pc_passwd_fi;
So, I have my process and an oracle child process:

root#m1cc1:>ps -fea | grep 29839

  usu *29839*  1072  1 11:38:19 ?       36:42 ./serv 48
  oracle  1352 *29839  *0 11:38:42 ?        1:06 oracleorac

(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

My problem is that sometimes (sporadicly) the oracle child process dies and therefore muy process gets a SIGPIPE signal a bit later, when it has to access BD for select, etc...
I haven't found why the oracle process dies and I don't find any reason/explanation either. There is no corefile generated. There is no traces either in oracle tracefiles or in oracle alert.

In this situation, I modified my SIGPIPE handler in order to reconnect to BD again, but then my process crashes and a corefile is generated. Next, I'm showing you the corefile debugged with dbx, where the SIGPIPE signal can be appreciated:

1.- Oracle dies
2.- My process attempts to access BD (line 54)
3.- SIGIPIE is raised (since there is no longer oracle process ) (line 33 )
4.- SIGPIPE handler (line 32)
5.- Attempting to reconnect.(line 31)

*[31] BDS_srv_fin(0xffbea26e, 0x0, 0x0, 0x0, 0x2, 0x90a10), at 0xaf950*

[32] tpcamain_trataSIGPIPE(0xd, 0x0, 0xffbea3c0, 0x0, 0x21bbc, 0xfdd04e0c), at 0x90aac*
[33] sslsshandler(0xd, 0x0, 0xffbea3c0, 0xfe6b801c, 0x0, 0x0), at 0xfdd04f80*
[34] MakeUser(0xd, 0x0, 0xffbea3c0, 0x3e9a80, 0x21698, 0xfdb3e168), at
0xfe69ae94
[35] sntpwrite(0x3f82a8, 0x3f87ac, 0x3f7c8c, 0x3fb836, 0xffbea9c4,
0x3e6764), at 0xfdb3e168
[36] ntpwr(0x3f7c48, 0x0, 0x301b98, 0xfdddeb4c, 0x3011f8, 0x3f82a8),
at 0xfdb3b6b4
[37] nspsend(0x3f7a60, 0x302718, 0x301a60, 0x3f7bf8, 0x0, 0x0), at
0xfd9fee80
[38] nsdofls(0x301a60, 0x3f7a60, 0xfdddeb4c, 0x8, 0x301b98, 0x3011f8),
at 0xfd9eaaa0
[39] nsdo(0x3f0008, 0x3f00c0, 0x302718, 0x0, 0x0, 0x3012f0), at 0xfd9e7f14
[40] nioqrc(0xc27, 0x1, 0x3011f8, 0x1, 0x0, 0xfdddeb4c), at 0xfda6a2b8
[41] ttcdrv(0x3e9a80, 0x3e66c4, 0x0, 0x0, 0x3e9d66, 0x3e9d64), at
0xfdb67a9c
[42] nioqwa(0x3e668c, 0x0, 0xfdb675fc, 0x3e9a80, 0x3e6610, 0x0), at
0xfda72964
[43] upirtrc(0x3e8c9c, 0x0, 0x0, 0x3e668c, 0x3e5f7c, 0x3e6764), at
0xfd90abd0
[44] kpurcsc(0x3e6764, 0xffbec0da, 0x3e65ac, 0x3e9a80, 0x3ea5bc,
0xfdb69fd4), at 0xfd94f46c
[45] kpuexecv8(0x0, 0x3eadbc, 0x46532c, 0x3e99cc, 0x3e8c9c,
0xfdddeb4c), at 0xfd9a5a08
[46] kpuexec(0x1, 0x4652e4, 0x202, 0x0, 0x0, 0x3e65e8), at 0xfd9a7654
[47] OCIStmtExecute(0x3e6764, 0x4652e4, 0x3e608c, 0x1, 0x0, 0x0), at
0xfd95dc1c
[48] sqlcucExecute(0x302a08, 0xfde08298, 0x1, 0x0, 0x0, 0x0), at
0xfd8dde7c
[49] sqlall(0xfde08298, 0x1, 0x4821f4, 0x1, 0x4, 0x1), at 0xfd8cce18
[50] sqlatm(0xfde08298, 0xfddb6898, 0x4, 0x1, 0x1, 0x4), at 0xfd8d4970
[51] sqlnst(0xfddb6898, 0xffbec970, 0xffbec970, 0x1, 0xfde08298,
0x4a82d8), at 0xfd8be7a4
[52] sqlcmex(0x0, 0x4a82d8, 0xffbec970, 0xfec50cf4, 0xfddb6898,
0xfdddeb4c), at 0xfd8a75b4
[53] sqlcxt(0x0, 0xfec50d10, 0xffbec970, 0xfec50cf4, 0xfe6be8a0, 0x0),
at 0xfd8a79f8
[54] *BDInformeAlarmas_cuentaInformesAlarpex(cod_sscc = 50, num_sec =
13823), at 0xfe8b0c70*
[55] ActuaWeb_RealizaEnvio(0xffbee358, 0x7, 0xffbee0b4, 0xffffffff,
0x2, 0xffbee359), at 0x5ad0c
[56] ActuaWeb_EnviaPanelesWeb(0x32, 0x9a, 0x21a640, 0x6, 0x2,
0x21a63e), at 0x5a9a4
[57] orden_O_panelweb(0xffbef680, 0x21a63e, 0x0, 0x4, 0xfef89870,
0x0), at 0x8bd84
[58] tpca_msg_o_llega_orden_o(0x21a63e, 0x1ff, 0x10, 0x1a97c4, 0x0,
0x0), at 0x8c328
[59] mensaje_eoc(0x21a624, 0x301868, 0x0, 0x0, 0x0, 0x0), at 0x90844
[60] leer_mensaje(0xa, 0xc8, 0xa, 0xfe034e38, 0xfe6bb1dc, 0xffffffff),
at 0x91fc8
[61] DRV_Ejecutar(0x0, 0xa, 0xfe02f8a8, 0xffbef888, 0xffbef908,
0xffbef738), at 0xfe00bc70
[62] LIBSGE_DRV_CicloMsj(0xfe036dc4, 0xffbef888, 0xfe02f8a8, 0xf4240,
0xfe03162c, 0xfe031618), at 0xfe00b324
[63] main(0x2, 0xffbefb04, 0xffbefb10, 0x19b800, 0x0, 0x0), at 0x9259c
(/opt/SUNWspro/bin/../WS5.0/bin/sparcv9/dbx)

+ Do you know what happens?
+ Any reason about the oracle process dead?

An oracle bug? + How can I analyse this behaviour in detail? ( It's difficult to anlayse because it ocurrs sporadicly) + Is it possible to get recovered from this abnormal situation? (I tried to reconnect but my process crashes)

Any clue will be appreciated.

Thank you very much.

Received on Tue Mar 06 2007 - 02:15:08 CST