Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Synopsis of a database crash and recovery (or time to bash

Re: Synopsis of a database crash and recovery (or time to bash

From: Ruth Gramolini <rgramolini_at_tax.state.vt.us>
Date: Mon, 12 Jun 2000 11:43:56 -0400
Message-Id: <10526.108678@fatcity.com>


How mission critical can it be running on NT? RBG ----- Original Message -----
From: <rsands_at_lendleaserei.com>
To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com> Sent: Monday, June 12, 2000 11:08 AM
Subject: Re: Synopsis of a database crash and recovery (or time to bash

>
>
> I inherited a production, 'mission critical' (according to
> damagement) database 'running' on an NT server connected to
> an EMC box. We kept experiencing corrupt blocks, and after
> many q&a sessions with the sys admin, I discovered that his
> predecessor had set up the box so that the database, the Oracle
> software and the NT operating system were all actually running
> off the EMC box. (Have no idea what the drives actually on the
> server were doing.) The EMC box also happened to be located two
> floors away, and was connected to the server by a cable that
> wasn't real consistent. I ran nightly hot backups and exports,
> both of which were used frequently until the sys admin could
> reconfigure the box. (but still not unix)
>
> NT databases can be dangerous. Not only is the os less reliable,
> (IMHO) the ease of setup can create a false sense of security/
> competence and the defaults are set to keep things simple, not
> always the best approach. I agree with all the 'lessons' that
> have been posted: stay on unix when possible, configure carefully,
> run frequent backups, monitor them closely and test frequently.
> Also, be certain that you and the sys admin really understand
> each other when discussing what's actually behind a mount point!!
>
>
> Robyn Sands
> Lend Lease REI
> (a former Californian, working for an Australian company, located in
Atlanta,
> GA)
>
>
>
>
>
> "Rachel Carmichael" <carmichr_at_hotmail.com> on 06/11/2000 11:03:58 PM
>
> Please respond to ORACLE-L_at_fatcity.com
>
> To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
> cc: (bcc: Robyn Sands/US1/Lend Lease)
>
> 5).
>
>
>
>
> Paul,
>
> having gone through a somewhat similar experience, my heart goes out to
you.
> I do have one question though:
>
> how come NO ONE noticed that the exports weren't being done, that the
> backups were being done with an open database and that the tape drive was
> gone and no backups were being done?
>
> You said that the server crashed in January and that exports and shutdown
of
> database before backup had therefore not been done since then. HOW COME NO
> ONE NOTICED???????????????????????????????? We are talking over 5 months
> here.
>
> Rachel
>
>
> >From: Paul Drake <paled_at_home.com>
> >Reply-To: ORACLE-L_at_fatcity.com
> >To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
> >Subject: Synopsis of a database crash and recovery (or time to bash RAID
> >5).
> >Date: Sun, 11 Jun 2000 16:54:06 -0800
> >
> >This past week, an Oracle Database (v7.3.4 Workgroup) on WinNT Server
> >4.0 crashed at a remote Client Site. Database running NOARCHIVELOG.
> >Single RAID 5 volume (4 drives), single hardware RAID controller.
> >It was determined that the root cause of the crash was a faulty RAID
> >controller - and that the volume was unavailable for read/write.
> >That's where the problem seemingly started.
> >Okay, not a huge deal yet, as we have 2 options for recovery - last cold
> >backup, or import last full export (executed fresh daily).
> >It turned out that the tape drive had failed weeks earlier - and no
> >backups had been taken in quite some time.
> >Uh oh. Okay, well - we still have the dump file, right?
> >Wrong.
> >In January this server had a catastrophic failure during a move - and
> >had to be restored from tape.
> >Backup was made with NTBackup - without backing up the registry. Had to
> >re-install oracle binaries.
> >Database was restored and online in 4.5 hours after the call was
> >reported - not great, not bad.
> >What did not take place was the re-scheduling of jobs run by the
> >operating system.
> >Without the scheduled jobs running - the database had not been shut down
> >before *cold* backups.
> >So those backups were worthless *hot* backups run without taking
> >tablespaces offline.
> >Without the scheduled jobs running - the daily export job had not
> >executed.
> >So the recovery options are from an export from January, before the
> >crash then.
> >Okay, we'll try to recover the database.
> >Startup mount - no problem. Can view all of the datafiles, status is
> >ONLINE.
> >Can view the online redo logs - all seem to be fine.
> >Alter database open - ORA-03113 - end of file on communication channel.
> >Core dump.
> >Attempted to mount and recover database - received mesage that no
> >recovery was needed.
> >Called oracle support.
> >
> >Opened a severity 1 TAR.
> >
> >Support stepped me through attempts to re-open and recover the database.
> >Still ORA-03113.
> >Got a full backup of all the existing files before they broke out the
> >jackhammer.
> >After exhausting all options, had to force open the database - which was
> >them corrupted.
> >I purposely forgot that init parameter used to force it open - I never
> >want to see it again.
> >Got most of the data out - still some was inaccessible - so recovery was
> >incomplete.
> >This event cost me more than 2 days of time that I didn't have.
> >Grabbed the compressed export files and imported them into a new
> >instance on my machine at work.
> >The crashed Server was rebuilt during this time - 2 RAID 1 volumes (new
> >RAID controller) - new OS install.
> >
> >** What I really wanted to get across is this: **
> >If you're a relatively new to managing Oracle Databases - particularly
> >on WinNT - please understand this:
> >
> >Running all files on a single RAID 5 volume is extremely bad.
> >Log files and control files most certainly should not be stored on RAID
> >5 volumes.
> >Swap space on RAID 5? Are you kidding?
> >(A well-tuned Oracle Instance won't be using the OS pagefile.sys at all
> >anyway)
> >
> >As someone else on the list once said: (to summarize)
> >
> >You're better off running JBOD (just a bunch of drives) that run only
> >RAID 5.
> >Maybe just mirror your OS and oracle binaries, control files, parameter
> >files.
> >Have the other drives set up as single drive RAID 0 volumes (or no
> >RAID).
> >Have a solid backup strategy in place, configure a disaster recovery
> >agent to avoid a bare metal recovery.
> >If the database is going to be at a remote site, use third party backup
> >utilities for hot backups.
> >Its not that hard to write the hot backup script - but it is more
> >difficult to restore from a home-grown script than to have a GUI in
> >front of the user that may be performing the recovery.
> >If you wrote the scripts to perform the hot backup - you *will* be
> >performing the recovery.
> >If its just a pre-configured restore job to run in a tool such as
> >Veritas NT Backup - even a Mac User could run it.
> >
> >If you get the chance to specify the box - use multiple RAID controllers
> >and DUPLEX across them.
> >When the machine loses a RAID controller - you can keep running until
> >the new one arrives, without even a hiccup.
> >
> >I haven't completely sworn off RAID 5 - I think that its a good option
> >compared with running RAID 0 for READ ONLY tablespaces. But for anything
> >that you have to write to - I would have to recommend against it.
> >
> >As far as recovery options running NOARCHIVELOG - there are 4:
> > recover from cold backup
> > recover from logical export
> > dice.com (dbajobs.com, etc.).
> > the 10K tool from Oracle.
> >
> >My ideal config uses 2 dual-channel RAID controllers, you have 4 I/O
> >channels - 2 internal and 2 external. The newer 5U rack mount storage
> >cabinets can contain up to 14 drives.
> >Just demand the "extra hardware".
> >Make sure that the backplanes are split - internal and external. Order
> >the extra cables needed.
> >Duplex all RAID volumes. Yes, you'll take a slight hit on throughput.
> >Big deal.
> >One more pair of drives would meet OFA standards (7 vols). Couldn't fit
> >it in this config.
> >So I put system on volume 0.
> >
> >Volume RAID Drives Size GB tablespaces Stores
> >0 1 2 8.7 System OS, Oracle Binaries, Control
File1
> >1 1 2 8.7 4 online redo_logs, archlogs, export
files
> >2 1 2 8.7 RBS control file2
> >3 1 2 8.7 TEMP control file3
> >4 1 2 8.7 INDEX_DATA
> >5 0+1 4+ 17.4 USER_DATA
> >
> >This config had 6 internal drives, 8 external drives - no hot spares.
> >I like the idea of having a pair of drives that are only writing
> >actively to the redo logs. (except for nightly exports).
> >This keeps the drive heads on the current redo log track - not searching
> >all over the drive for whatever block is asked of it.
> >If the drive heads are already on the right track, 1/10,000th of a
> >second isn't long to wait for a write, compared with a 7 ms avg seek
> >time.
> >With Ultra 160/m drives these days and 64 bit, 66 MHz PCI buses, access
> >times are the rate-limiting factor - not pure I/O throughput.
> >If you have a write-back cache enabled, its not such an issue - but I'm
> >still a little sceptical to enable that, even with a battery backup on
> >the controller card and a UPS feeding the server.
> >
> >One more thing - the entire GUI concpt usually lacks the most important
> >thing - a scripted way to reproduce the configuration that you just
> >made. If you are going to re-create from bare metal, you have to be able
> >to reproduce all of your Database's settings.
> >Don't use the GUI NT Resouce Kit scheduler for adding jobs - do it with
> >a script so that these jobs can be reproduced.
> >Recovery from a tape backup won't restore the scheduled jobs.
> >
> >drakonian.
> >--
> >Author: Paul Drake
> > INET: paled_at_home.com
> >
> >Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> >San Diego, California -- Public Internet access / Mailing Lists
> >--------------------------------------------------------------------
> >To REMOVE yourself from this mailing list, send an E-Mail message
> >to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> >the message BODY, include a line containing: UNSUB ORACLE-L
> >(or the name of mailing list you want to be removed from). You may
> >also send the HELP command for other information (like subscribing).
>
> ________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
>
> --
> Author: Rachel Carmichael
> INET: carmichr_at_hotmail.com
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from). You may
> also send the HELP command for other information (like subscribing).
>
>
>
>
>
>
> --
> Author:
> INET: rsands_at_lendleaserei.com
>
> Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
> San Diego, California -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
Received on Mon Jun 12 2000 - 10:43:56 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US