RE: Database hanging/crashing repeatedly

From: MacGregor, Ian A. <ian_at_slac.stanford.edu>
Date: Thu, 6 Oct 2011 08:59:09 -0700
Message-ID: <FD1D618E4F164D4C8BA5513D4268174A017140673C8C_at_EXCHCLUSTER1-02.win.slac.stanford.edu>



We had a somewhat similar problem, but in our case we couldn't even log into the server afterwards, not even to the SP. The situation was somewhat complicated because we had automatic failover set up, which performed beautifully, but there was some concern as that automatic failover was somehow inducing the failure.

In the end Oracle/SUN came in and replaced disk controllers and upgraded the firmware. We have not had any problem since. We are not fully satisfied that this actually fixed the problem. It occurred for us during a time of bringing many systems on line, and the load on the database was quite light. Since the fix the load on the machine has been fairly heavy. I know that is counter-intuitive, but we have seen this happen under these situations twice, but never during a load. Yes we have gone over our startup procedures. This is not just starting of the database machine, but all the machinery required to establish, tune, and monitor the electron beam. The database machine is an active participant in our controls system.

Last weekend the power company had some problems, first reducing us to two-phase power, and finally cutting off all power to the site. So far no problems with the database servers have occurred. So perhaps the fix was indeed what was needed.

Ian MacGregor
SLAC National Accelerator Laboratory

Do your alert logs give you any clues, how about /var/adm/messages?

-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Mandal, Ashoke Sent: Wednesday, October 05, 2011 9:39 PM To: oracle-l-bounce_at_freelists.org
Cc: oracle-l_at_freelists.org
Subject: Database hanging/crashing repeatedly

One of our databases has been crashing in the same fashion repeatedly and sounds like the root cause is the same but not sure what exactly is causing it.

Suddenly the database starts hanging. No user can login and gets the error (ORA-03135: connection lost contact). When I try to login as sysdba it hangs and never gets connected to the database. Grid control monitoring receives the error (Failed to connect to database instance: ORA-03135: connection lost contact (DBD ERROR: OCISessionBegin).

--

http://www.freelists.org/webpage/oracle-l Received on Thu Oct 06 2011 - 10:59:09 CDT

Original text of this message