Re: FastStart Failover Problem.

From: MacGregor, Ian A. <ian_at_slac.stanford.edu>
Date: Tue, 5 Aug 2014 21:35:20 +0000
Message-ID: <9378E8E9-D099-49AD-96EE-3A685065CCBD_at_slac.stanford.edu>



Here’s what we know.
last reboot shows the server going down Fri Jul 25 23:58

It however did not make it all the way down as it remained pingable until it was powered off through it’s SP.

Nothing is in the dataguard log for that time. Fast Start failover occurs when the primary database loses its connection to data guard monitor and the standby database. The monitor then tries to re-establish the connection and after a time, the default being 30 seconds failover is initiated

What happens if the connection is never lost? Our networking team also reported that they had contact with the box until it was powered off.

On Aug 4, 2014, at 9:42 PM, Nassyam Basha <nassyambasha_at_gmail.com<mailto:nassyambasha_at_gmail.com>> wrote:

Quiet strange situation with FSFO. According to the threshold value the observer should start initiating failover. Even ssh isn't working, so that is something out of box and we can say server is not accessible.

I will check below things in that case, I hope you have already done them? if so please share your findings.

1) From broker what is the configuration status ? Primary and standby database(s) are in status without errors/warnings?
2) I hope observer started with the log file "dgmgrl -logfile /tmp/obsvr.log", you got chance to check to get any information
3) Status of FSFO too "DGMGRL> SHOW FAST_START FAILOVER;"
4) I would suggest also look at the logs of broker.

Thank you..

On Tue, Aug 5, 2014 at 4:10 AM, MacGregor, Ian A. <ian_at_slac.stanford.edu<mailto:ian_at_slac.stanford.edu>> wrote: We’ve been running FastStart Failover for quite a while, and it has served us well. However we had an interesting problem a couple of weeks ago. The primary server panicked but got stuck on the way down. It entered a state where it was not accepting any new connections either via sql*net or ssh. However, existing connections were not closed. Thus dataguard saw the primary database as being up. Even though no one could connect to it.

Ian MacGregor
SLAC National Laboratory--
http://www.freelists.org/webpage/oracle-l

--

Nassyam Basha.
Oracle DBA
The Pythian Group <http://www.pythian.com/> 11g OCP Certified, Blogger
Co-Author: Oracle Data Guard 11gR2<http://www.amazon.in/Oracle-Guard-11gR2-Administration-Beginners/dp/1849687900> Member of Oraworld-team<http://www.oraworld-team.com/> [http://www.oraclefromguatemala.com.gt/wp-content/uploads/2014/03/oraworld.png] Visit My Blog<http://www.oracle-ckpt.com/> Let's Connect - Linkedin Profile<http://in.linkedin.com/in/nassyambasha/> My Twitter<https://twitter.com/nassyambasha> My Facebook<https://www.facebook.com/nassyambasha>

--

http://www.freelists.org/webpage/oracle-l Received on Tue Aug 05 2014 - 23:35:20 CEST

Original text of this message