Re: Weird database hanging

From: Don Seiler <don_at_seiler.us>
Date: Tue, 18 Sep 2007 21:02:36 -0500
Message-ID: <716f7a630709181902t73b910a1re2b516e9186101ca@mail.gmail.com>

I assume Support mentioned open_cursors and session_cached_cursors because the ADDM report mentioned them.
You are right that I couldn't log in during the hang. However Support has clued me into the "sqlplus -prelim / as sysdba" that does let me log in. I was able to issue the oradebug commands then to try to get the level 10 system state dump. However the session gets booted before the dump finishes, although by then there is quite a lot of info in the dump already (a 7+ MB text file).
No it isn't RAC nor is it ASM.
Per Oracle Support I have generated ASH, AWR and ADDM reports for the time period of one of the "hangs" yesterday, in addition to the RDA tool. However the ashrpt file doesn't have the string "enq" in it anywhere. Perhaps I'm being dense as to what you meant.
We are not using huge pages. My SA says he'll set it up, since it can't hurt now. I'm reading the Puschitz tuning page about it now. I suppose shrinking the SGA or shared pool is always an option.

One other difference between old and new servers is that I have enabled the default degree of parallelism on our "warehouse" tables and indexes, and set NOPARALLEL on our frontend objects. On our old hardware it was kind of random which tables or indexes had it enabled and what degree was set. Also our parallel_max_servers is 32, where before it was 8. I don't think it would hurt to bring it down again.

Don.

On 9/18/07, Alex Gorbachev <ag_at_oracloid.com> wrote:
> Don,
>
> Kudos to Jeremiah for directing you away from guesswork path (thanks from
> BAAG) but Oracle Support normally does exactly that so don't wait for them -
> rather try to guide them to proper analysis. Did they give you explanation
> on how they wanted to avoid these hungs increasing OPEN_CURSORS and
> SESSION_CACHED_CURSORS?
>
> So far there seems to be a clear indication that new session creation is
> part of the problem so maybe this can give us some hints.
> Few things that come up to my mind to help with investigation:
> - Can you pre-establish SQL*Plus connection as SYSDBA and have it handy?
> During the hang run system state dump and haganlyze dump. This would provide
> some food for thoughts. Oracle support might request it anyway.
> - Do your application/bulk load process reconnect often? Maybe it's
> visibility of hung whereas you simply can't connect? If that's connection
> issue - audit enabled? AFTER LOGON trigger exist?
> - You haven't mentioned it explicitly or I missed but I assume it's not RAC.
> Is it?
> - Do you think you can generate ASH report and figure out object id for
> "enq: SQ - contention". This might give us the hot sequence. Same report can
> provide some hints (P1,P2,P3) for other waits. If instance was really bad,
> we might not have any ASH info if instance was really bad, though.
>
> Since you mentioned large SGA, do you use huge pages? Without them, there
> might be tremendous overhead of large SGA with many processes and especially
> overhead when process starts and starts accessing shared memory.
>
> Hope this helps a bit,
> Alex
>
>
>
> On 9/18/07, Don Seiler <don_at_seiler.us> wrote:
> >
> > I have opened an SR with Oracle, as it has hung 3 times today and
> > actually crashed once.
> >
> > When the database hang, Ignite is showing "latch: shared pool" and
> > "latch: library cache" waits. Otherwise I don't see these at all.
> >
> > Oracle has had me up OPEN_CURSORS and SESSION_CACHED_CURSORS, but I
> > did that last night (with instance restart) and, as I said, it has
> > hung 3 times and crashed once since then. Oracle's also telling me
> > that this is largely due to application coding. My problem with that
> > is that the application code has been in place for a while.
> >
> > To recap: we migrated to the 64-bit machine on Sep 1. Hanging has
> > occurred since Sep 13, seemingly during bulk load activity. Our SGA
> > is quite a bit larger (16 GB) than on our 32 bit box (1.5 GB).
> >
> > db_cache_size big integer 12G
> > shared_pool_size big integer 2G
> >
> > I haven't seen the "block change tracking buffer space" wait since
> > yesterday morning, thankfully.
> >
> > Any tips would be appreciated.
> >
> > Thanks,
> > Don.
> >
> > --
> > Don Seiler
> > oracle: http://ora.seiler.us
> > ultimate: http://www.mufc.us
> > --
> > http://www.freelists.org/webpage/oracle-l
> >
> >
> >
>
>
>
> --
> Alex Gorbachev, Oracle DBA Brewer, The Pythian Group
> http://www.pythian.com/blogs/author/alex
> http://www.oracloid.com
> BAAG party - www.BattleAgainstAnyGuess.com

-- 
Don Seiler
oracle: http://ora.seiler.us
ultimate: http://www.mufc.us
--
http://www.freelists.org/webpage/oracle-l

Received on Tue Sep 18 2007 - 21:02:36 CDT