Avoid common Oracle Recovery mistakes

Donald K. Burleson's picture

Even Oracle Certified DBAs cringe at the thought of performing a real-world database recovery. As disk and hardware has become super-stable, many Oracle DBAs have never experienced the adrenaline rush of a full-blown Oracle recovery.

With a mission critical database at-stake, these Oracle recoveries are often performed under great stress, especially when thousands of employees cannot do their jobs until you have recovered their Oracle system.

I am amazed how many Oracle shops do not discover that their backups are bad until they are doing a mission-critical recovery. The most common causes of backup include:

  • Cold backups while the database is running - This is a very common Oracle backup error. When you restore the media, Oracle will not open the database because the system change numbers (SCN) in the file headers do not match.

  • Bad Media - Many Oracle backups do not check the media to ensure that the database has been successfully written to the backup tape or disk. I have seen many cases where the backup writes and empty of incomplete backup, or where parity checks exist. Some shops re-read their backups to ensure against parity errors.

  • No ARCHIVELOG mode - I have seen many shops who only discover that they cannot roll-forward until they attempt a recovery. Many DBAs have lost their jobs when they must tell management that many days of work has been lost forever.

  • Bad hot backups - There are many ways to perform a hot backup, and many of them will not work properly. Hot backups are tricky, and the prudent DBA will insist that the recovery from the hot backup is tested, or at least get a CYI memo from management if they refuse to test their Oracle recovery capabilities. Remember, someone IS going to be fired when a mission-critical database cannot be restored, and this memo could save your job.

There are also areas of Oracle database recovery where the DBA could make a serious error. These mistakes are often the result of stress and poor judgment, and even Oracle Technical Support may fail to insist of these precautions. Here are two common Oracle recovery issues to avoid:

Back-up your failed database first

The very first task of an Oracle DBA should be to back-up the corrupt database. Sadly, many Oracle shops do not test their recoveries, and in cases where you discover that your backups are not recoverable, you may be glad that you have a copy of the original corrupt database.

Rarely force open a bad recovery

When an Oracle database recovery is corrupt the Oracle database will not open. This is for a good reason and it is Oracle's way of preventing the serious complications of having to manually repair thousands of corrupt database blocks. When a recovered database will not open you have three choices:

  1. Go to an earlier backup - It is far better to have a longer roll-forward period than to have to repair Oracle corruption.

  2. Restore the initial failed database - This may have less manual repair time than forcing open a bad recovery

  3. Force-open the database - It amazes me how many Oracle DBAs will have a failed recovery and call Oracle Technical Support asking them to force-open the database without considering other alternatives. Forcing-open a corrupt database is a last-resort and should only be done with the help of Oracle Technical Support and when there are no other options. At this juncture, many Oracle DBAs will realize that they are going to be fired anyway, and walk off the job.

In sum, the Oracle DBA is the custodian of the database and they must be prudent and cautious with their mission-critical system and always follow best-practices to ensure recoverability of the Oracle system.