From: guy ruth hammond <>
Date: Mon, 12 Jun 2000 22:36:43 +0100
Message-Id: <>

Rachel Carmichael wrote:

> You said that the server crashed in January and that exports and shutdown of
> database before backup had therefore not been done since then. HOW COME NO
> ONE NOTICED???????????????????????????????? We are talking over 5 months
> here.

Ah, Rachel, you are young and trusting and we wuvs you!

A story I heard once went like this: A government installation had just hired a new operator for their mainframe. An "operator" is to a sysadmin like a junior DBA is to a senior DBA, they are expected to do the day to day maintenance of the system. In this case, the operator's primary job was to run the nightly backup script, which the sysadmin had written. (I don't know how it was organized before, I presume the sysadmin used to stay behind and do it, but then he acquired commitments that meant he had to leave earlier. Or maybe he just got bored. But I digress). In the version I heard, the operator was a grad student who worked nights. The sysadmin showed her how to log on, run the backup script, etc, and she said she understood, so he left her to it. Every night she'd come in, start the job on a batch queue, then go and sit in the sysadmin's office and study.

The system was used to process records throughout the year, and was cleared out at the end of the year, so the volume of the data in it (and hence the backup capacity required) increased over the year. Near the end of the year (I don't know whether it was a calendar, fiscal or academic year) the sysadmin asked the operator if she thought she'd need more tapes, but she just looked at him blankly. Every evening, she'd logged in, and started the backup job, then went away - but after a short delay, the terminal warned that there was no tape in the drive. After five minutes, the terminal security settings had cleared the screen and logged her off, and whenever she'd checked, there were no messages on the screen, and everything looked OK.

So what had happened was, she never knew that she needed to insert the tapes into the drive manually, and he'd never thought that he needed to review the backup logs now that he'd hired an operator. Fortunately, nothing had gone wrong during the year, and he was able to run a complete level 0 backup there and then.

I don't know whether this story is true or not, but it just illustrates that as soon as a system starts to require human communication, there is immediately an inherent risk associated. It also illustrates that when you take responsibility for a system, check every assumption that you make, and verify everything that you are told.


