Re: risk: hangalalyze and system state

From: John Hurley <hurleyjohnb_at_yahoo.com>
Date: Tue, 11 Feb 2014 15:53:07 -0800 (PST)
Message-ID: <1392162787.97041.YahooMailNeo_at_web181201.mail.ne1.yahoo.com>


This is also of course one of the standard tricks to have canned and tested regularly on your test systems that match as close as you can your production environment.

The usual setup is a unix script that you can invoke with parameters ( level / how many to take / how long apart in time / just system state / just hang analyze ) etc.

You then document it ... test it on test systems ... keep it ready for a rainy day.

Then of course the most frustrating part can be if you have that all ready and you experience bad problems and you get all the evidence into Oracle support and they are unable to debug.  

 

________________________________
 From: Austin Hackett <hacketta_57_at_me.com>
To: oracle-l digest users <oracle-l_at_freelists.org> 
Sent: Tuesday, February 11, 2014 4:01 PM
Subject: Re: risk: hangalalyze and system state
  

Hi Jeremy

I have had a bad experience with system state dumps on Solaris 10 (SPARC) and single instance 11.2.0.2.1. 

This is going back a couple of years in a previous role, so sorry I can't provide  much detail.

It involved an investigation into a parent cursor memory leak bug. Support asked me to take some system state dumps, which i did during a quiet time. This was a pretty high volume system, so a fair amount going on even during quiet periods. I'd scoured MOS for bugs involving system states dumps in my version, asked the analyst for confirmation multiple times that it was safe to do this, got clearance from management etc. Needless to say, within a minute or so of me running for first command, there were monitoring dashboards glowing red and application servers timing out. If I remember correctly, an exclusive mutex was held whilst the dump was being written to disk leading to fairly severe system hang. The feedback from support was that this kind of thing could happen in rare circumstances (which wasn't what they told me when I first asked!)

I remember reading another 112.2 war story shortly afterwards: http://oracledoug.com/serendipity/index.php?/archives/1645-Systemstate-Dump-warning.html

Unless things are totally hosed anyway, it's only something I'd do following a thorough search of MOS for bugs and a discussion with relevant people about the risk of taking the system state dump versus the impact/frequency of the issue at hand and any possible workarounds.

Thanks

Austin



--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 12 2014 - 00:53:07 CET

Original text of this message