Re: A Real Stinker to Solve - EXPERTS APPLY WITHIN (DB Crashing)

From: <wfholmes_at_yahoo.com>
Date: 12 Oct 2006 12:20:39 -0700
Message-ID: <1160680839.484959.126500@e3g2000cwe.googlegroups.com>

Johne_uk wrote:
> wfholmes_at_yahoo.com wrote:
>
> >
> > Look at the following link:
> >
> > http://www.experts-exchange.com/Databases/Oracle/Q_20095668.html
>
> I know its a bit cheeky but could you possibly post the text from the
> article below in this thread. The only way to read it is to pay a
> subscription.
>
> thanks
> John

It's a bit long, but here goes:

With the above mentioned error message I am unable to start my database.
When I start the databse from server manager I am getting the message.
Let me know the solution for the above message immediately.Becoz of this project is getting delayed.

Hi,

Please check your background process trace file for further messages.or let us know about your trace file.then only we can help u.

kapil

Comment from balasubram
Date: 03/22/2001 10:20PM PST
Comment

Hi,
Is there anyother message which accompanies theis ORA-00470.Please put the details which it has given in the .trc file generated at the bdump destination
Bala

Comment from kitttu
Date: 03/22/2001 11:17PM PST
Author Comment

Thanks for ur immediate response,
Since there are many trace file,I could not able to find which trace file I should search.
let me know which file I need to search.

Comment from misho2000
Date: 03/22/2001 11:38PM PST
Comment

Look in your alertSID.log file. You will see message describing your lgwr failure. Below it you will see the name of trsce file in which descriptive information is written( about lgwr process). Then go to your udump directory and open the file with this name. All details are in it.

Comment from balasubram
Date: 03/22/2001 11:39PM PST
Comment

Hi,
Any of the .trc file would be helpful.
Possibly take the first .trc file generated by looking at the time stamp of the file.
Bala

Comment from bpouydog
Date: 03/22/2001 11:57PM PST
Comment

There are several possible solutions to this error, and they depend on your hardware & current configuration.

The most common reason for this error is that new datafiles have been recently added to your system. You need to verify and possibly increase your MAXFILES setting.

Example:

On checking the MAXFILES kernel parameter, you discover that it is set to 60. Checking V$datafiles shows 47 data files. Computing the total of data files, control files, mirror redo logs, init.ora and alert.log files, total
exceeds the 60 file limit.

You are receiving the following errors in PMON, LGWR and SMON trace files:

error 470 detected in background process OPIRIP: Uncaught error 447. Error stack: ORA-00447: fatal error in background process ORA-00470: LGWR process terminated with error

No errors are being reported in the alert.log. You did receive the following
alert.log entry: "ORACLE Instance <name> - Can not allocate log, archival
required"

You are forced to do the following to get the database back up:

SHUTDOWN ABORT
STARTUP NORMAL
SHUTDOWN IMMEDIATE
STARTUP NORMAL

Here, simply increase the value of MAXFILES.

If the ora-470 is accompanied by ora-264, verify that the call stack trace matches the call stack
below, and that the same series of errors were initially returned. The ORA-00264 is very
important because it tells us that no recovery is needed. ** THIS IS MOST IMPORTANT TO THIS APPROACH.** Opening the database with resetlogs recreates the redo logs and corrects the problem that made the redo file
unreadable.

Take a COLD backup.
SVRMGR> connect internal
SVRMGR> startup mount
SVRMGR> recover database until cancel;
SVRMGR> alter database open resetlogs;. statement processed.

Opening the database with resetlogs resolves the problem because the redo logs do not contain any needed redo information as seen by the ORA-00264 error. The problem seems to be caused by an I/O error reading/writing to
the redo log that causes Oracle to not be able to startup with the current state of the LOG

###########################################

Internal error in LGWR during startup w/large LOG_BUFFER

(VMS specific):

Prior to Oracle 7.3.4, when using a large value for the init.ora parameter LOG_BUFFER, LGWR could fail with an internal error. During startup, LGWR would come down with an internal or fatal error, and then DBWR would report these messages:

ORA-00449: background process 'DBWR' unexpectedly terminated with error 470

ORA-00470: LGWR process terminated with error ORA-01092: ORACLE instance terminated. Disconnection forced

This could also happen on a new database, during a log switch.

Although this problem was exacerbated by using large values for LOG_BUFFER, an examination of the code indicates that there was significant risk for this bug even if using the default for LOG_BUFFER. It is perhaps only by chance that the bug was not hit in testing.

Impact:
You cannot startup. In particular, you cannot create a new database, and you cannot migrate an old database from 7.1.

Workaround:
Use the default value for LOG_BUFFER.

(HP-UX specific):

If the database crashes and you receive the following entry in the trace file (please examine alert.log & lgwr_*.log files):

ORA-600 [4414], [0], [0], [12221], [2], [], [], [] ORA-01092: ORACLE instance terminated. Disconnection forced

After looking through background process you may see the following entries:

lgwr_.log

ORA-00447: fatal error in background process
ORA-00313: open failed for members of log group 4 of thread 1
ORA-00312: online log 4 thread 1:
ORA-07362: sfifi: open error, unable to open file.

HP-UX Error: 23: File table overflow

pmon_.trc

ORA-00447: fatal error in background process ORA-00470: LGWR process terminated with error

The cause of the crash is the entry:

HP-UX Error: 23: File table overflow.

Look at the /usr/include/sys/errno.h file and notice that error 23 is specifically related to the nfile o/s parameter:

#define ENFILE 23 /* File table overflow - */

Just bump up the number currently set for that parameter.

Comment from sajwat
Date: 03/23/2001 04:15AM PST
Comment

The OS 2 error indicates that this online redo log file is missing. It must have been moved or deleted? If the file was moved, you can change the location within the database with this command:

alter database rename file 'E:\ORANT\DATABASE\LOGORC15.ORA' to 'NEW LOCATION'; (whatever that path is)

If the file is gone and you can't restore it, you can 'fake' a recovery and open the database with resetlogs which will recreate the online redo log.

svrmgr30> startup mount pfile=
svrmgr30> recover database until cancel; (if prompted for a file, just type cancel) svrmgr30> alter database open resetlogs;

Accepted Answer from bpouydog
Date: 03/23/2001 06:27PM PST
Grade: B
Accepted Answer

Hi,

The best way to proceed with this error is to identify the root cause of this error.

Go to your alert.log file. It is present in the destination specified by background_dump_dest init.ora parameter. Check this file for all the references of ora-470 error. This will tell you which trace files (if any) have been generated for this error. You may have to browse thru the trace files in order to reach the root cause of this error.

The cause can be an Oracle bug, or it can be due to some settings that have been changed - say log_buffers or a datafile has been added. The solution may also be dependent on the OS and Oracle Version you are running on.

This error simply indicates the the background process lgwr has not been able to function normally and has been terminated. This is causing your database to go down, as LGWR is mandatory for the database to be up. The cause as identified above will tell us how to proceed.

I am attaching 2 sample cases for your viewing.

##################################################

CASE 1

Problem Description:

Your database performance is degrading and eventually you crash.

You look in the PMON trace file and find the following errors.

error 470 detected in background process

     00447, 00000, "fatal error in background process"
     // *Cause:  One of the background processes died unexpectedly.
     // *Action: Warm start the system.
ora-470
     00470, 00000, "LGWR process terminated with error"
     // *Cause:  The log writer process died
     // *Action: Warm start instance

DBWR and SMON indicate the same thing.

Example:

The above has been an on going problem for about two weeks. The standard procedure for this particular company is to shutdown the system
every Monday and do a full backup every Sunday with incrementals everynight.
Customer running Oracle applic. with about 5 users on when DB dies.

You are receiving the following errors in PMON, LGWR and SMON trace files:

error 470 detected in background process OPIRIP: Uncaught error 447. Error stack: ORA-00447: fatal error in background process ORA-00470: LGWR process terminated with error

No errors are being reported in the alert.log. You did receive the following
alert.log entry: "ORACLE Instance <name> - Can not allocate log, archival
required"

You are forced to do the following to get the database back up:

SHUTDOWN ABORT
STARTUP NORMAL
SHUTDOWN IMMEDIATE
STARTUP NORMAL

Solution: INCREASE MAXFILES SETTING

Solution Description:

You need to verify and possibly increase your MAXFILES setting.

Solution Explanation:

In example explored in the problem section, the user had recently added new
datafiles to his instance.

On checking the MAXFILES kernal parameter, he discovered that it was set to
60. Checking V$datafiles showed 47 data files. Computing the total of data
files, control files, mirror redo logs, init.ora and alert.log files, total
exceeded the 60 file limit.

CASE 2

Problem Description:

You are having problems with the database crashing. You look in the alert.log
and find no errors or trace files referenced. When you check the bdump

directory, you do find three trace files, a DBWR, a LGWR and a PMON trace file. The DBWR
trace file contains the following errors:

ora-00447 fatal error in background process ora-00470 LGWR process terminated with error

In the LGWR trace file you find:

ora-00447 fatal error in background process
ora-00313 open failed for members of log group <name> of thread <name>
ora-00312 online log <name> thread <num>: <str>
ora-07362 sfifi: open error, unable to open file.
ora-00470 LGWR process terminated with error

In the PMON trace file you find:
ora-00447 fatal error in background process ora-00470 LGWR process terminated with error

Problem Explanation:

The OS kernel parameter 'NFILES' parameter is set too low. When Oracle is
trying to open an online log file, it is failing due to operating system
restraints.
You may have recently upgraded or migrated and added more redo log groups or
mirrors. Thus you are now having this problem where you did not previously.

Solution: INCREASE OS PARAMETER 'NFILES'.

Solution Description:

Have your System Administrator increase the 'NFILES' parameter on the Operation
System.

Solution Explanation:

The OS kernel parameter 'NFILES' parameter is set too low. When Oracle is
trying to open an online log file, it is failing due to operating system
restraints. Received on Thu Oct 12 2006 - 14:20:39 CDT