Re: 1 minutes: best downtime story

From: Maureen English <maureen.english_at_alaska.edu>
Date: Thu, 28 Mar 2013 11:42:33 -0800
Message-ID: <51549D29.9020503_at_alaska.edu>



We have a scheduled power outage during our annual Fire Alarm and Safety Test. We start shutting down non-production servers Friday evening, less critical production servers on Saturday, and by midnight Sunday, everything has been shutdown cleanly.

We start bringing things up in a very orderly fashion around 2am on Sunday and typically have everything up by about 4pm...14 hours later....

Yesterday, at about 3:30pm, we were notified that we lost power and the UPS systems had about 30 minutes left. I'm not sure what caused the power to go out, or why there was only a 30 minute window, but we got almost every production system shutdown cleanly before everything crashed at about 4:10pm.

An hour later, the power was back and stable, so we started bringing things up. We had a few problems with some critical machines, but by 11pm, only *6* hours after the power came back, all of the problems with the critical machines had been resolved and just about everything was back up and functioning!

So, this isn't really a 'best downtime', it's a 'best teamwork' story. The preparation we did last Fall for our scheduled outage saved us so much work and allowed us to bring everything up fast and orderly last night...even when were all pretty tired!

Received on Thu Mar 28 2013 - 20:42:33 CET

Original text of this message