Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: DB shutdown problem

Re: DB shutdown problem

From: Ethan Post <epost1_at_my-deja.com>
Date: Wed, 30 Aug 2000 22:43:36 GMT
Message-ID: <8ok2mj$igp$1@nnrp1.deja.com>

I don't know if this is you but just saw it yesterday.

-Ethan
http://www.freetechnicaltraining.com
http://www.gnumetrics.com



This alert was modified 30-August-2000 by adding the Q&A section.

HANG DURING STARTUP OR SHUTDOWN



Versions Affected

  Oracle Server and Oracle Enterprise Server 7.X through 8.1.6

Platforms Affected



  UNIX GENERIC Description

  Due to Oracle [BUG:1084273] a timer overflow will cause background   processes to loop indefinitely. A timer overflow occurs when the number
  of clock ticks exceeds the positive representation of datatype in which
  the value is being stored.

Likelihood of Occurrence



  Operating systems with clock ticks set to milliseconds will see this   problem after 24.8 days but typically systems have clock ticks set   to centiseconds which means the problem would not be seen for 248 days.

Possible Symptoms



  After the number of clock ticks since the machine was lasted rebooted   overflows you will not be able to shutdown or startup the affected   Oracle RDBMS products. If you operating system provides a system call   trace such as Sun's truss utility you can check for the behavior.

    % truss -af -o <output_file> -p <pid_of_pmon_process>

  If you see output similar to the following you are looping and may be   experiencing this problem:

   24369:  semop(720897, 0xEFFFE7A0, 1)    (sleeping...)
   24448:      Received signal #14, SIGALRM, in semop() [caught]
   24448:  semop(720897, 0xEFFFDAA8, 1)                    Err#91
ERESTART
   24448:  sigprocmask(SIG_BLOCK, 0xEFFFD6C0, 0x00000000)  = 0
   24448:  times(0xEFFFD650)                               = -2117821797
   24448:  setitimer(ITIMER_REAL, 0xEFFFD650, 0x00000000)  = 0
   24448:  sigprocmask(SIG_UNBLOCK, 0xEFFFD6C0, 0x00000000) = 0
   24448:  setcontext(0xEFFFD790)
   24377:  semop(720897, 0xEFFFE7A0, 1)    (sleeping...)
   24398:      Received signal #14, SIGALRM, in semop() [caught]
   24398:  semop(720897, 0xEFFFE7A0, 1)                    Err#91
ERESTART
   24398: sigprocmask(SIG_BLOCK, 0xEFFFE3B8, 0x00000000) = 0
   24398:  times(0xEFFFE348)                               = -2117821686

  Note that the "times" call is returning a very small negative value.   It is also important to understand that negative values returned by   times is not the problem but how the Oracle timer checks against it is   the issue. It is normal to see negative values returned by "times"   on systems that have been up over 248 days.

  Any running instances will need to be aborted with a "shutdown abort"   before the system is shutdown.

Questions & Answers



Q. Is this bug really generic? Which platforms are known to be affected

    by it?
A. So far only Solaris is confirmed, and NCR is suspected. Other

    platforms have not been verified.

Q. Does this problem affect Oracle7?
A. This problem does not affect Oracle7 on Sun Solaris. Other platforms

    have not been verified.

Q. How do I know whether my system will be impacted 24 days or 248 days

    after reboot?
A. On Solaris, the command "/usr/bin/getconf CLK_TCK" will return the

    number of clock ticks per second. If the number is 1000, the system is

    impacted 24 days after reboot. If the number is 100, the system is     impacted 248 days after reboot.

    Another way to determine the clock ticks of your system is to use the

    command:

            "truss -tsysconfig time true"

    On Solaris this will show:

            sysconfig(_CONFIG_CLK_TCK)                      = 1000
    if your system clock ticks 1000 times per second.

Q. How do I know whether my system is close to being impacted? A. Use the uptime command to determine how long the system has been

    running. Based on the number of clock ticks per second, you will be able

    to tell when the system will be impacted. Alternatively, you can see the

    current return value of the times system call using this command:

         "truss -ttimes time true"

    On Solaris this will show:

         times(0xEFFFFC20)                               = 962090

    This system was rebooted 962.09 seconds ago. After the return value of

    the times() system call reaches 2147483647, it will wrap and become     -2147483648. When times() returns a negative value, your system has been

    impacted.

Q. How do I know that the patch fixed the problem? A. When your system has passed the impact date, without the patch

    installed, shutdown abort a test database on the system, and try to     start it up. If the startup hangs during the mount phase, you have     encountered the problem. Install the patch on the test database, and     retry the startup. It should now startup fine.

Workaround



  The fix for this issue is incorporated into the 8.1.7 and newer   releases of Oracle Server and Oracle Enterprise Server.

  If a patch is not available for your operating system or Oracle   version, see the patch list in the next section, the workaround   is to do the following:

  1. Stop all running databases on the server
  2. Reboot the server

  This will reset the timer and start the number of ticks back to "0".

Patches



  As of August 28, 2000 there are two patches available for Sun Solaris   32 bit:

   8.0.6: [BUG:1265297]      @tcpatch:/u01/patch/SUN_SOLARIS2/8.0.6.0.0/bug1265297

   8.1.6: [BUG:1227119]      @tcpatch:/u01/patch/SUN_SOLARIS2/8.1.6.0.0/bug1227119

References



  DATABASE HANGS AFTER 24 DAYS, LOOPING ON SEMOP CALL [BUG:1084273]
  @ DO NOT USE TIMES() RETURN VALUE                   [BUG:1185824]


Sent via Deja.com http://www.deja.com/
Before you buy. Received on Wed Aug 30 2000 - 17:43:36 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US