Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Quick Tale: Lost production database because of keyboard and Veritas Cluster Server

Re: Quick Tale: Lost production database because of keyboard and Veritas Cluster Server

From: B.M. Wright <bmwright_at_xmission.xmission.com>
Date: Wed, 27 Feb 2002 23:08:00 +0000 (UTC)
Message-ID: <a5josg$i9g$2@news.xmission.com>


In comp.sys.sun.hardware milkfilk <milkfilk_at_yahoo.com> wrote:
> I'm posting to multiple groups so that I might save someone's neck.

> Me:
> I'm not a DBA and I'm no Veritas expert.

> Background:

> The cluster is configured for failover so if one server blows up, the
> other server mounts the disk and starts the oracle processes and
> starts up the db instances.

> What happened:
> I pulled the keyboard plug on our Sun server while rewiring our KVM
> switch. Yes, I know.

        As I'm sure others will point out, why the hell did you have a keyboard/graphics console hooked up to a database server?

> What this does (according to usenet posts) unfortunately, is send a
> Stop-A signal (in the form of an electrical short, I suppose). This
> shouldn't be a problem, because the server is simply 'paused' and in
> most instances you can simply type go and there shouldn't be any large
> consequences. Of course, you can't expect to hit Stop-A all the time
> and get away with it.

        Well, this is where you went wrong, you would have been much much better off by doing a 'sync' and panicing the machine at that point instead of typing 'go'.

> Our cluster is configured for failover, like I said, and the other
> server mounted the arrays and started up the oracle process. The
> instance hic-upped but it was running.

        Which is what it would be expected to do, however the old instance was still running on the other machine that was "paused". By the time you "un-paused" it the other machine had already imported the shared diskgroup and started the database. Both were writing to the disks and they trashed the data. This is why you should have simply crashed the machine instead of trying to recover.

> We have an old backup but it's not great and we were in the process of
> getting our backup procedure tested / working.

        It's critical production and you're still testing backups then in the meantime monkeying around with hardware while the system is live? No more comments there really.

        Even if the backup is "old" then provided your DBAs are using transaction logging and they've been backing up the logs then you should be able to 1) restore "old" backup 2) roll all the transaction logs forward and be back to the point you were at when the last logs were dumped/backed up. Exceptions being if they had done bulk loads or other things that don't use transaction logs (or typically have the logging turned off during the operation for performance reasons).

> Let me tell you that I'm shocked that an enterprise system can do
> this. A keyboard unplug started all of this. I'm looking at
> disabling the keyboard and this is my job as a UNIX SysAdmin to know
> this stuff, but the Veritas Cluster should have worked!

        It was your mistake, you can't blame the software really when it DID work. It did take over the instance on the other machine. I'm pretty sure that if you read around in the documentation either for the Oracle agent, the Oracle docs, or the VCS docs and find a scenario like this that they will recommend you doing exactly what I said (crash the machine that is no longer owner of the service/database).

> If one server simply blows up, the other server should pick up the
> database and certainly not corrupt this "SCN" ...

-- 
B.M. Wright
bmwright_at_xmission.com
Received on Wed Feb 27 2002 - 17:08:00 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US