RE: SunFire Server Hangs

From: MacGregor, Ian A. <ian_at_slac.stanford.edu>
Date: Wed, 11 Mar 2015 14:36:50 +0000
Message-ID: <54a88977968b4ea88e207df94705493a_at_exch13-mail03.win.slac.stanford.edu>



I don't take care of the box itself. I don't know enough about OpenBoot to answer.

Ian

-----Original Message-----
From: De DBA [mailto:dedba_at_tpg.com.au] Sent: Tuesday, March 10, 2015 4:56 PM
To: oracle-l_at_freelists.org
Cc: MacGregor, Ian A.
Subject: Re: SunFire Server Hangs

Hi Ian,

Could it be that the OS is actually down, but the OpenBoot console is still running? I'm not sure if that would answer to the ping to the OS, but worth a shot..

Cheers,
Tony

On 11/03/15 02:52, MacGregor, Ian A. wrote:

        Some more information:

        The machines are of two types SunFire X4250 and X4270. /etc/release on one machine is

	                      Solaris 10 10/09 s10x_u8wos_08a X86
	           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
	                        Use is subject to license terms.
	                           Assembled 16 September 2009

	and on the other  two

	cat /etc/release
	                        Solaris 10 5/09 s10x_u7wos_08 X86
	           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
	                        Use is subject to license terms.
	                             Assembled 30 March 2009

	We do apply Solaris patching quarterly.  

	The machine  definitely tried to halt.  After bringing the machine  backup “last reboot”  shows the system down time at the time of the "freeze.  The machine continued to ping after that time.  The response is the server itself not from any firewall. So despite the reported system down time at least part of the OS was up

	Theere is nothing in /var/log/messages not any file which might be used as a utility to monitor the system all I/O ceased at the time of the freeze.  There were no resources which were in any danger of exhaustion as far as we can tell before the freeze.
	But that could be because it happened too quickly to be sampled.  I don’t think resource exhaustion is at all likely here.

	There is a single raid controller.  I thought the system disks might not under this  controller, but it turns out they are.
	So the complete loss of the I/O system is looking more likely.

	I did find we were a bit back level on the BIOS. 


... <snip>

i0zX+n{+i^ Received on Wed Mar 11 2015 - 15:36:50 CET

Original text of this message