RE: The Idiot's Guide to Recovering from Truncating a Production Table

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 20 Mar 2008 11:25:39 -0400
Message-ID: <01bd01c88a9e$a8075da0$1100a8c0@rsiz.com>


Not sure I wanted my name associated with this thread, but just in case someone might follow this, it should be noted that somewhere in the neighborhood of step 7 you need to preserve the current contents of your on-line redo logs if you intend to give yourself the additional chances to recover you want. Consider what happens if something goes "bump" after step 11.2, such as realizing you typed in the a time a little after the referenced event. Whatever your religious views about putting online redo logs on backup sets, preserving a copy of the contents of the current online redo logs is the first step of recovery if you would like to preserve the possibility of more than one try past the end of the last archived redo log. Likewise, backing up the controlfile to trace to get a human readable reconstruction source is potentially very useful.

And while I have no personal experience in the matter, Tim's business case seems likely to be on target. A large class of systems could continue to function reasonably and possibly you can defer interruption by slapping full auditing on the table in question and sometimes by revoking update. The dollars lost per hour of outage versus the potential liability of mutilated transactions in dollar cost is probably worth briefly considering, and I hope no one is able to issue such commands against something like the production running copy of an air traffic control system.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman
Sent: Wednesday, March 19, 2008 9:11 PM
To: oradbt054_at_gmail.com
Cc: Oracle-L Freelists
Subject: Re: The Idiot's Guide to Recovering from Truncating a Production Table

David,

It is not always easy to determine the importance of a single table in a complex application, never mind the downstream impact of a unilateral decision to SHUTDOWN in the middle of a work day. In many circumstances, step #2, not step #1, could be a a true career-limiting move. Personally, I'd reverse the order of steps #2 and #3...

Also, step #13 is missing an important step between the SHUTDOWN and STARTUP command -- BACKUP DATABASE (full or level-0). Before you open that newly-recovered database for users, after a prolonged and unexpected outage, you need to ensure that you can recover that mutilated database from any further failures. And, you may very well have to do that full database backup "cold", with the users shut out. Think about it -- if some failure should occur prior to the completion of a "hot" backup? So, step #13 can be maddeningly prolonged, but it is necessary, unless one wishes to start from step #1 all over again...

But all this can be avoided if you instead perform a point-in-time recovery to a separate "clone" database, which consists of just all the tablespaces that contain undo/rollback segments (including SYSTEM) and the set of datafiles containing the table that was truncated. Once this offline clone database is restored, you can then import the recovered table back into the production database. The step involving some form of import of the table may seem inefficient, but the advantages are that you can retain any records added after the truncate, you don't lose the data in the other tables, and you do not have to perform the full/level0 database backup in order to maintain recoverability.

The last time I had to do this in production was prior to the advent of RMAN. We did a PITR recovery to a clone database, imported the table, brought the applications back online within 90 mins of the DROP TABLE, then went out and got smashed the rest of the afternoon. We returned to the office after dark to get our cars, and found a new production outage in progress. As I was smashed and this happened back in the 20th century, 13 years ago, I don't recall what that second outage was, but I remember our happy buzz turning to mean hangover in an instant, followed by a late and painful night. Funny the things you remember... ;-)

Hope this helps...

-Tim

David Taft wrote:
> All,
>
> After reading some of the posts on the "DBA Future" thread, it
> reminded me of the worse mistake I ever made as a DBA. Immediately
> after the incident I documented the actual recovery procedures partly
> in jest. Thankfully I haven't had a job interview in 15 years and
> don't anticipate having one anytime soon, so I feel relatively safe
> sharing this here for anyones possible enjoyment. :-)
>
> Cheers,
>
> David
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> The Idiot's Guide to Recovering from Truncating a Production Table
>
> In this scenario you just truncated a table in production that you
> meant to truncate in development. Now what do you do?
>
> 1) Shout OHHH SH**! (fill in the '*' with whatever letters you think
> are appropriate)
>
> 2) After you've gotten over the initial shock of this monumental
mistake, do a
>
> Shutdown immediate;
>
> 3) Stop, think and immediately tell your manager or coworkers what you
did.
>
> 4) Curse a coworker who had nothing to do with YOUR mistake. It won't
> help the situation, but now you can feel bad for two stupid mistakes.
>
> 5) Apologize for cursing your coworker and accept FULL responsibility
> for YOUR mistake.
>
> 6) Now that you are rational, calmly discuss with your
> manager/coworkers the best course of action. If it is decided to
> proceed with a point-in-time recovery, then continue with the
> following steps.
>
> 7) Backup the controlfile in case you have to restart the recovery:
>
> Alter database backup controlfile to
> '$ORACLE_HOME/dbs/$ORACLE_SID_backup.ctl';
>
> 8) Check the alert log for the exact time you started the "shutdown
immediate".
>
> 9) Think about how much time may have passed between step 1 and 2,
> then decide on a safe point-in-time before the truncate occurred.
>
> 10) Run 'rman'.
>
> RMAN> connect target /
> RMAN> restore database;
> -- After restore completes
> RMAN> exit
>
> 11) Run 'sqlplus "/ as sysdba'
>
> SQL> recover database until time 'YYYY-MM-DD:HH24:MI:SS';
> SQL> alter database open resetlogs;
> --shutdown may take a while to apply undo from recovery.
> SQL> shutdown immediate;
> SQL> Startup restrict;
>
> 12) Check to make sure everything looks OK.
>
> 13) If everything looks good, then open the database to the users:
>
> SQL> shutdown immediate;
> SQL> startup;
>
> 14) Thank your coworkers for all their help and be grateful you still
> have a job.
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l
Received on Thu Mar 20 2008 - 10:25:39 CDT

Original text of this message