Re: Waiting for smon to disable tx recovery. 9i on unix.

From: Mark D Powell <Mark.Powell_at_eds.com>
Date: 9 Jun 2006 17:13:03 -0700
Message-ID: <1149898383.677471.71880@i40g2000cwc.googlegroups.com>

dba_222_at_yahoo.com wrote:
> Dear experts,
>
> I have some strange symptoms these past two days on our 9i
> database on unix.
>
> Yesterday, I went to do an update of a large table. About 7
> million rows. It continued overnight. I killed the session
> in the morning.
>
> Previously, when I killed a session like that, I would
> watch it rollback in V$transaction. However, this time, there
> was no transaction to be seen. When I went to count the table,
> it took 15 minutes, when it typically takes 15 seconds. I
> killed that session too.
>
> Later, I saw the same query in active processes. But the
> sessions were not in v$session. I killed the ghost sessions too.
>
> Finally, I decided to bounce the database, and did a shutdown
> immediate.
> It hung. I logged into another session with sys, and again
> tried a shutdown immediate. I got:
> ORA-24324: service handle not initialized
> ORA-24323: value not allowed
> ORA-01089: immediate shutdown in progress - no operations are permitted
>
> I'll include the alert log, and the latest trace files.
>
> Trace repeatedly shows:
> Waiting for smon to disable tx recovery.
>
> The alert log repeatedly shows:
> Thread 1 advanced to log sequence 6501
>
> It has been doing this for over 6 hours now.
>
> top
> - shows the dbw0, and smon working about 1 to 1.5% of the cpu.
>
>
> File timestamps:
> - The control files, and some datafiles (including undo) are
> continuously
> updated with the current timestamp. One redo log, the same as in
> the alert log, has the current timestamp. The other redo logs
> have older timestamp.
>
> It all seems to be recovering. But when will it end?
>
> I've seen before, when I've done an abort on something similar,
> that you still have to wait for the whole process to recover.
> So, you don't really gain anything in that case by aborting.
>
> Has anyone seen anything like this?
> What do you think is the best course of action?
>
> Just let it run, and it will finally shutdown?
> Kill the SMON process?
> Shutdown abort from another session?
> Something else?
>
> Thanks a lot!
>
>
> Alert log:
>
>
> Fri Jun 9 10:52:27 2006
> Thread 1 advanced to log sequence 6488
> Current log# 1 seq# 6488 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo01.log
> Fri Jun 9 11:52:10 2006
> SMON: Restarting fast_start parallel rollback
> Fri Jun 9 11:54:03 2006
> SMON: ignoring slave err,downgrading to serial rollback
>
>
> Fri Jun 9 12:10:13 2006
> Shutting down instance: further logons disabled
> Fri Jun 9 12:14:21 2006
> Shutting down instance (immediate)
> License high water mark = 22
> Waiting for dispatcher 'D000' to shutdown
> All dispatchers and shared servers shutdown
> Fri Jun 9 12:14:23 2006
> ALTER DATABASE CLOSE NORMAL
> Fri Jun 9 12:14:58 2006
> Thread 1 advanced to log sequence 6489
> Current log# 2 seq# 6489 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo02.log
> Fri Jun 9 12:19:36 2006
> Waiting for smon to disable tx recovery.
> Fri Jun 9 12:37:05 2006
> Thread 1 advanced to log sequence 6490
> Current log# 3 seq# 6490 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo03.log
> Fri Jun 9 12:57:36 2006
> Thread 1 advanced to log sequence 6491
> Current log# 1 seq# 6491 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo01.log
>
>
> ...
>
> Fri Jun 9 16:35:56 2006
> Thread 1 advanced to log sequence 6501
> Current log# 2 seq# 6501 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo02.log
> Fri Jun 9 16:56:26 2006
> Thread 1 advanced to log sequence 6502
> Current log# 3 seq# 6502 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo03.log
> Fri Jun 9 17:16:09 2006
> Thread 1 advanced to log sequence 6503
> Current log# 1 seq# 6503 mem# 0:
> /u01/oracle/OraHome1/oradata/the_db/redo01.log
>
>
>
> Trace file:
>
>
> /u01/oracle/OraHome1/admin/the_db/udump/the_db_ora_26493.trc
> Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.3.0 - Production
> ORACLE_HOME = /u01/oracle/OraHome1
> System name: SunOS
> Node name: the_unix_server
> Release: 5.8
> Version: Generic_108528-29
> Machine: sun4u
> Instance name: the_db
> Redo thread mounted by this instance: 1
> Oracle process number: 14
> Unix process pid: 26493, image: oracle_at_the_unix_server (TNS V1-V3)
>
> *** SESSION ID:(15.453) 2006-06-09 12:19:36.865
> Waiting for smon to disable tx recovery.
> *** 2006-06-09 12:24:47.969
> Waiting for smon to disable tx recovery.
> *** 2006-06-09 12:29:59.031
>
> ...
>
> *** 2006-06-09 17:20:19.487
> Waiting for smon to disable tx recovery.
> *** 2006-06-09 17:25:30.551
> Waiting for smon to disable tx recovery.
> *** 2006-06-09 17:30:41.613
> Waiting for smon to disable tx recovery.

How did you kill the session? If you used alter system kill session it is possible that the Oracle background process still existed. Did you check the session to see what it was doing, v$session_wait, v$sql, v$transaction prior to killing it? Did you look at the plan to see if it was performing the task in an inefficient manner?

Oracle has been opening the database and beginning new work before all backouts are complete for several versions. If a user session attempts to access data that needs backing out to a consistent state then that session is used to perform the backout. In other words the reader pays the price for fast startup and opening.

Killing the session and bouncing the database just adds to the work that needs to be done. Being that you are on a crappy version, 9.2.0.3, you could also be hitting a bug. I would let the database just run for a while and take the time to search metalink for any related issues. But in general if a transaction takes 7 hours to do work it could take more than 7 hours to back it out.

Next time before killing something see what it is doing if anything and recored the front and back-end pids. Also check the alert log and trace direcories to make sure nothing bad is going on or has happened.

HTH -- Mark D Powell -- Received on Fri Jun 09 2006 - 19:13:03 CDT