Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: DBA Alert

Re: DBA Alert

From: Joseph S. Testa <teci_at_oracle-dba.com>
Date: Thu, 28 Sep 2000 08:47:58 -0400
Message-Id: <10633.118140@fatcity.com>


here it is in its entirety:

Note:118228.1
  Subject:

         ALERT: Hang During Startup/Shutdown on Unix When System Uptime >
         248 Days
  Type:
         ALERT
  Status:
         PUBLISHED

                                                                  Content Type:

TEXT/PLAIN
                                                                  Creation Date:


24-AUG-2000
                                                                  Last Revision
Date:

25-SEP-2000

                                                                  Language:

USAENG   This alert was modified 30-August-2000 by adding the Q&A section.   This alert was modified 31-August-2000 by removing 7.x from the   Versions Affected. Q&A section updated with new information regarding   affected platforms and product versions.   This alert was modified 21-September by adding reference to BUG 1399885   This alert was modified 25-September-2000 by adding a Q&A on corruption and   revision of the affected platforms Q&A.

  HANG DURING STARTUP OR SHUTDOWN



  Versions Affected

    Oracle Server and Oracle Enterprise Server 8.0.X through 8.1.6

  Platforms Affected



    UNIX GENERIC   Description

    Due to Oracle [BUG:1084273] a timer overflow will cause background     processes to loop indefinitely. A timer overflow occurs when the number     of clock ticks exceeds the positive representation of datatype in which     the value is being stored.

  Likelihood of Occurrence



    Operating systems with clock ticks set to milliseconds will see this     problem after 24.8 days but typically systems have clock ticks set     to centiseconds which means the problem would not be seen for 248 days.

  Possible Symptoms



    After the number of clock ticks since the machine was lasted rebooted     overflows you will not be able to shutdown or startup the affected     Oracle RDBMS products. If you operating system provides a system call     trace such as Sun's truss utility you can check for the behavior.

      % truss -af -o <output_file> -p <pid_of_pmon_process>

    If you see output similar to the following you are looping and may be     experiencing this problem:

     24369:  semop(720897, 0xEFFFE7A0, 1)    (sleeping...)
     24448:      Received signal #14, SIGALRM, in semop() [caught]
     24448:  semop(720897, 0xEFFFDAA8, 1)                    Err#91 ERESTART
     24448:  sigprocmask(SIG_BLOCK, 0xEFFFD6C0, 0x00000000)  = 0
     24448:  times(0xEFFFD650)                               = -2117821797
     24448:  setitimer(ITIMER_REAL, 0xEFFFD650, 0x00000000)  = 0
     24448:  sigprocmask(SIG_UNBLOCK, 0xEFFFD6C0, 0x00000000) = 0
     24448:  setcontext(0xEFFFD790)
     24377:  semop(720897, 0xEFFFE7A0, 1)    (sleeping...)
     24398:      Received signal #14, SIGALRM, in semop() [caught]
     24398:  semop(720897, 0xEFFFE7A0, 1)                    Err#91 ERESTART
     24398:  sigprocmask(SIG_BLOCK, 0xEFFFE3B8, 0x00000000)  = 0
     24398:  times(0xEFFFE348)                               = -2117821686

    Note that the "times" call is returning a very small negative value.     It is also important to understand that negative values returned by     times is not the problem but how the Oracle timer checks against it is     the issue. It is normal to see negative values returned by "times"     on systems that have been up over 248 days.

    Any running instances will need to be aborted with a "shutdown abort"     before the system is shutdown.

  Questions & Answers



  Q. Can this bug cause database corruptions?
  1. No. This bug does NOT cause database corruptions. It causes a hang. To resolve the hang, reboot the system. Do not restore the database from a backup or recreate the controlfile in an attempt to resolve the hang.
  2. What about the reports of controlfile corruption?
  3. In an attempt to resolve a hang on ALTER DATABASE MOUNT, a user recreated the controlfile of their database. Even though this allowed the MOUNT to proceed, it didn't solve the hang at later points. There was nothing wrong
      with the controlfile. After recreating it, the MOUNT process followed a
      path which did not result in a hang. This is NOT the correct workaround
      for this problem, and may result in loss of valuable metadata. See the
      Workaround section below for the correct workaround.

  Q. Which platforms are known to be affected by this bug?

  1. This bug is known to affect the Sun Solaris SPARC platform, 32bit and 64bit. We have checked the following platforms and found that they are not affected by this bug: IBM AIX/SP (uses post/wait driver), HPUX (uses gettimeofday), Compaq Tru64 (uses gettimeofday), Linux (uses gettimeofday).

      Platforms not mentioned here have not been verified yet.

  Q. Does this problem affect Oracle7?

  1. This problem does not affect Oracle7 on any platform.
  2. How do I know whether my system will be impacted 24 days or 248 days after reboot?
  3. On Solaris, the command "/usr/bin/getconf CLK_TCK" will return the number of clock ticks per second. If the number is 1000, the system is impacted 24 days after reboot. If the number is 100, the system is impacted 248 days after reboot.
      Another way to determine the clock ticks of your system is to use the
      command:
              "truss -tsysconfig time true"

      On Solaris this will show:
              sysconfig(_CONFIG_CLK_TCK)                      = 1000
      if your system clock ticks 1000 times per second.

  Q. How do I know whether my system is close to being impacted?

  1. Use the uptime command to determine how long the system has been running. Based on the number of clock ticks per second, you will be able to tell when the system will be impacted. Alternatively, you can see the current return value of the times system call using this command:

           "truss -ttimes time true"

      On Solaris this will show:
           times(0xEFFFFC20)                               = 962090

      This system was rebooted 962.09 seconds ago. After the return value of
      the times() system call reaches 2147483647, it will wrap and become
      -2147483648. When times() returns a negative value, your system has been
      impacted.

  Q. How do I know that the patch fixed the problem?

  1. When your system has passed the impact date, without the patch installed, shutdown abort a test database on the system, and try to start it up. If the startup hangs during the mount phase, you have encountered the problem. Install the patch on the test database, and retry the startup. It should now startup fine.
  2. Can the fix for 8.0.6.0 also applied for 8.0.6.1 ?
  3. Yes. This patch is for the CORE library, which is not part of the patchsets.
  4. Can the fix for 8.1.6.0 also applied for 8.1.6.1 and 8.1.6.2 ?
  5. Yes. This patch is for the CORE library, which is not part of the patchsets.

  Workaround



    The fix for this issue is incorporated into the 8.1.7 and newer     releases of Oracle Server and Oracle Enterprise Server.

    If a patch is not available for your operating system or Oracle     version, see the patch list in the next section, the workaround     is to do the following:

  1. Stop all running databases on the server You may have to issue a SHUTDOWN ABORT command.
  2. Reboot the server Follow the normal OS procedure to reboot the server, do not simply power off the machine.

    This will reset the timer and start the number of ticks back to "0".

  Patches



    As of September 25, 2000 there are four patches available for Sun Solaris     32 bit:
     8.0.5: [BUG:1400358]
     8.0.6: [BUG:1265297]
      8.1.5: [BUG:1400327]
     8.1.6: [BUG:1227119]

    As of September 25, 2000 there is one patch available for Sun Solaris     64 bit: Received on Thu Sep 28 2000 - 07:47:58 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US