| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> c.d.o.server -> Re: Quick Tale: Lost production database because of keyboard and Veritas Cluster Server
[snip]
 
> 	Well, this is where you went wrong, you would have been much much
> better off by doing a 'sync' and panicing the machine at that point instead
> of typing 'go'.
Valuable advice. Thanks.
> 
> > Our cluster is configured for failover, like I said, and the other
> > server mounted the arrays and started up the oracle process.  The
> > instance hic-upped but it was running.
> 
> 	Which is what it would be expected to do, however the old instance
> was still running on the other machine that was "paused".  By the time you
> "un-paused" it the other machine had already imported the shared diskgroup
> and started the database.  Both were writing to the disks and they trashed
> the data.  This is why you should have simply crashed the machine instead
> of trying to recover.
This is what I was afraid of when I saw two machines mounting the same volumes. Bad things at that point. But it worked for a brief period of time (the instance was serving data). This doesn't take me out of the cannon.
> 
> > We have an old backup but it's not great and we were in the process of
> > getting our backup procedure tested / working.
> 
> 	It's critical production and you're still testing backups then in
> the meantime monkeying around with hardware while the system is live?  No
> more comments there really.
It's a tough thing to answer I suppose. Would have, could have, definitely should not have.
> 	Even if the backup is "old" then provided your DBAs are using
> transaction logging and they've been backing up the logs then you should be
> able to 1) restore "old" backup 2) roll all the transaction logs forward and
> be back to the point you were at when the last logs were dumped/backed up.
> Exceptions being if they had done bulk loads or other things that don't use
> transaction logs (or typically have the logging turned off during the operation
> for performance reasons).
Is this transaction logging in reference to archive log mode?
> 
> > Let me tell you that I'm shocked that an enterprise system can do
> > this.  A keyboard unplug started all of this.  I'm looking at
> > disabling the keyboard and this is my job as a UNIX SysAdmin to know
> > this stuff, but the Veritas Cluster should have worked!
> 
> 	It was your mistake, you can't blame the software really when it DID
> work.  It did take over the instance on the other machine.  I'm pretty sure
> that if you read around in the documentation either for the Oracle agent, 
> the Oracle docs, or the VCS docs and find a scenario like this that they
> will recommend you doing exactly what I said (crash the machine that is no
> longer owner of the service/database).
> 
That'd would be a fantasic thing to ask in the Veritas class that I was planning on taking in 3-4 months (better make it 3). Anyone else noticing the lack of experience / learning process theme coming back?
Well, I'm afraid that I've gotten into a defensive position. The fact is, I can't justify my actions now or even before I did it because I had faith in a system that a lot of people don't have any idea about because many things have been rushed.
If I gave a list of actions (do this, do this) for someone to predict, only a select number of people in the world would be able to tell me what the consequences would be.
I also think you all have given me some good advice / post-mortim. Received on Thu Feb 28 2002 - 16:33:45 CST
![]()  | 
![]()  |