Re: DB shutdown problem

From: Joe Maloney <jrpm_at_my-deja.com>
Date: Thu, 07 Sep 2000 14:21:07 GMT
Message-ID: <8p887s$kl4$1@nnrp1.deja.com>

There are two other conditions that could lead to your problem.

The first is if you are running any other process that automatically connects to the database (some SNMP processes, OEM, applications, etc). Shutdown normal and shutdown immediate will wait for those processess to terminate before bringing the databases down. If they don't terminate, then........

Also, if they automatically connect, then when you restart, they might reconnect before you can do your shutdown normal. Ever hear of a loop?

A second condition, which I have found on NT, is that depending on the amount of database activity (inserts/updates/deletes) and number of users, Oracle may take a while cleaning itself up while going down (applying the data from Rollbacks and Redos to the tablespaces) normally (or immediately). I had one server take 17 hours to shutdown. This usually happens on servers where the activity level is high and constant, so the engine doesn't have enough time to do all it's cleanup in the background.

In article <8omsfk$sk3$1_at_nnrp1.deja.com>, student <ennaadu_at_my-deja.com> wrote:
> Thanks ethan..
>
> EN Nadu
>
> In article <8ok2mj$igp$1_at_nnrp1.deja.com>,
> Ethan Post <epost1_at_my-deja.com> wrote:
> > I don't know if this is you but just saw it yesterday.
> >
> > -Ethan
> > http://www.freetechnicaltraining.com
> > http://www.gnumetrics.com
> >
> > ************************************
> > This alert was modified 30-August-2000 by adding the Q&A section.
> >
> > HANG DURING STARTUP OR SHUTDOWN
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Versions Affected
> > ~~~~~~~~~~~~~~~~~
> > Oracle Server and Oracle Enterprise Server 7.X through 8.1.6
> >
> > Platforms Affected
> > ~~~~~~~~~~~~~~~~~~
> > UNIX GENERIC
> >
> > Description
> > ~~~~~~~~~~~
> > Due to Oracle [BUG:1084273] a timer overflow will cause background
> > processes to loop indefinitely. A timer overflow occurs when the
> > number
> > of clock ticks exceeds the positive representation of datatype in
> > which
> > the value is being stored.
> >
> > Likelihood of Occurrence
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> > Operating systems with clock ticks set to milliseconds will see
this
> > problem after 24.8 days but typically systems have clock ticks set
> > to centiseconds which means the problem would not be seen for 248
> > days.
> >
> > Possible Symptoms
> > ~~~~~~~~~~~~~~~~~
> > After the number of clock ticks since the machine was lasted
rebooted
> > overflows you will not be able to shutdown or startup the affected
> > Oracle RDBMS products. If you operating system provides a system
call
> > trace such as Sun's truss utility you can check for the behavior.
> >
> > % truss -af -o <output_file> -p <pid_of_pmon_process>
> >
> > If you see output similar to the following you are looping and may
be
> > experiencing this problem:
> >
> > 24369: semop(720897, 0xEFFFE7A0, 1) (sleeping...)
> > 24448: Received signal #14, SIGALRM, in semop() [caught]
> > 24448: semop(720897, 0xEFFFDAA8, 1) Err#91
> > ERESTART
> > 24448: sigprocmask(SIG_BLOCK, 0xEFFFD6C0, 0x00000000) = 0
> > 24448: times(0xEFFFD650) = -
2117821797
> > 24448: setitimer(ITIMER_REAL, 0xEFFFD650, 0x00000000) = 0
> > 24448: sigprocmask(SIG_UNBLOCK, 0xEFFFD6C0, 0x00000000) = 0
> > 24448: setcontext(0xEFFFD790)
> > 24377: semop(720897, 0xEFFFE7A0, 1) (sleeping...)
> > 24398: Received signal #14, SIGALRM, in semop() [caught]
> > 24398: semop(720897, 0xEFFFE7A0, 1) Err#91
> > ERESTART
> > 24398: sigprocmask(SIG_BLOCK, 0xEFFFE3B8, 0x00000000) = 0
> > 24398: times(0xEFFFE348) = -
2117821686
> >
> > Note that the "times" call is returning a very small negative
value.
> > It is also important to understand that negative values returned
by
> > times is not the problem but how the Oracle timer checks against
it
is
> > the issue. It is normal to see negative values returned
by "times"
> > on systems that have been up over 248 days.
> >
> > Any running instances will need to be aborted with a "shutdown
abort"
> > before the system is shutdown.
> >
> > Questions & Answers
> > ~~~~~~~~~~~~~~~~~~~~~
> > Q. Is this bug really generic? Which platforms are known to be
affected
> > by it?
> > A. So far only Solaris is confirmed, and NCR is suspected. Other
> > platforms have not been verified.
> >
> > Q. Does this problem affect Oracle7?
> > A. This problem does not affect Oracle7 on Sun Solaris. Other
platforms
> > have not been verified.
> >
> > Q. How do I know whether my system will be impacted 24 days or 248
days
> > after reboot?
> > A. On Solaris, the command "/usr/bin/getconf CLK_TCK" will return
the
> > number of clock ticks per second. If the number is 1000, the
system
> > is
> > impacted 24 days after reboot. If the number is 100, the system
is
> > impacted 248 days after reboot.
> >
> > Another way to determine the clock ticks of your system is to
use
> > the
> > command:
> > "truss -tsysconfig time true"
> >
> > On Solaris this will show:
> > sysconfig(_CONFIG_CLK_TCK) = 1000
> > if your system clock ticks 1000 times per second.
> >
> > Q. How do I know whether my system is close to being impacted?
> > A. Use the uptime command to determine how long the system has been
> > running. Based on the number of clock ticks per second, you will
be
> > able
> > to tell when the system will be impacted. Alternatively, you can
> > see the
> > current return value of the times system call using this
command:
> >
> > "truss -ttimes time true"
> >
> > On Solaris this will show:
> > times(0xEFFFFC20) = 962090
> >
> > This system was rebooted 962.09 seconds ago. After the return
value
> > of
> > the times() system call reaches 2147483647, it will wrap and
become
> > -2147483648. When times() returns a negative value, your system
has
> > been
> > impacted.
> >
> > Q. How do I know that the patch fixed the problem?
> > A. When your system has passed the impact date, without the patch
> > installed, shutdown abort a test database on the system, and try
to
> > start it up. If the startup hangs during the mount phase, you
have
> > encountered the problem. Install the patch on the test database,
and
> > retry the startup. It should now startup fine.
> >
> > Workaround
> > ~~~~~~~~~~
> > The fix for this issue is incorporated into the 8.1.7 and newer
> > releases of Oracle Server and Oracle Enterprise Server.
> >
> > If a patch is not available for your operating system or Oracle
> > version, see the patch list in the next section, the workaround
> > is to do the following:
> >
> > 1) Stop all running databases on the server
> >
> > 2) Reboot the server
> >
> > This will reset the timer and start the number of ticks back
to "0".
> >
> > Patches
> > ~~~~~~~
> > As of August 28, 2000 there are two patches available for Sun
Solaris
> > 32 bit:
> >
> > 8.0.6: [BUG:1265297]
> >
> > @tcpatch:/u01/patch/SUN_SOLARIS2/8.0.6.0.0/bug1265297
> >
> > 8.1.6: [BUG:1227119]
> >
> > @tcpatch:/u01/patch/SUN_SOLARIS2/8.1.6.0.0/bug1227119
> >
> > References
> > ~~~~~~~~~~
> > DATABASE HANGS AFTER 24 DAYS, LOOPING ON SEMOP CALL [BUG:1084273]
> > @ DO NOT USE TIMES() RETURN VALUE [BUG:1185824]
> >
> > Sent via Deja.com http://www.deja.com/
> > Before you buy.
> >
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>

--
Joseph R.P. Maloney, CCP,CSP,CDP
MPiR, Inc.
502-451-7404
some witty phrase goes here, I think.


Sent via Deja.com http://www.deja.com/
Before you buy.

Received on Thu Sep 07 2000 - 09:21:07 CDT