Re: Server unexpect shutdown
Date: Thu, 16 Oct 2008 20:43:56 -0700
Message-ID: <1224215031.834593@bubbleator.drizzle.com>
GLADtr wrote:
> Hi all.
> Help me please!
> what is may be ?
>
> SYSTEM REDHAT 4
> uname -a
> Linux edw1 2.6.9-67.0.0.0.1.ELsmp #1 SMP Sun Nov 18 01:06:10 EST 2007
> x86_64 x86_64 x86_64 GNU/Linux
> Oracle 10.2.0.4
>
> I`m look in all log files.
> once suspected in dmesg
>
> ACPI: Processor [CPU0] (supports C1)
> ACPI: Processor [CPU1] (supports C1)
> ACPI: Processor [CPU6] (supports C1)
> ACPI: Processor [CPU7] (supports C1)
> ACPI: Processor [CPU8] (supports C1)
> ACPI: Processor [CPU9] (supports C1)
> ACPI: Processor [CPUE] (supports C1)
> ACPI: Processor [CPUF] (supports C1)
> ________________________________________________
> Losing some ticks... checking if CPU frequency changed.
> ------------------------------------------------------------------------------------
> microcode: CPU0 already at revision 0x16 (current=0x16)
> microcode: CPU1 already at revision 0x16 (current=0x16)
> microcode: CPU2 already at revision 0x16 (current=0x16)
> microcode: CPU3 already at revision 0x16 (current=0x16)
> microcode: CPU4 already at revision 0x16 (current=0x16)
> microcode: CPU5 already at revision 0x16 (current=0x16)
> microcode: CPU6 already at revision 0x16 (current=0x16)
> microcode: CPU7 already at revision 0x16 (current=0x16)
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> and
>
> once error find in the mcelog
>
> MCE 0
> CPU 7 BANK 4 TSC 4f58cf631fca4
> MCG status:
> MCi status:
> Error enabled
> MCA:BUS Generic Generic Generic Other-transaction Request-timeout
> Error
> Model:Pad address glitch
>
> STATUS 9000080110800e0f MCGSTATUS 0
> MCE 1
> CPU 6 BANK 4 TSC 4f58cf6321e0a
> MCG status:
> MCi status:
> Error enabled
> MCA:BUS Generic Generic Generic Other-transaction Request-timeout
> Error
> Model:Pad address glitch
>
> STATUS 9000080110800e0f MCGSTATUS 0
> MCE 0
> CPU 9 BANK 4 TSC 960e0d4f7b60
> MCG status:
> MCi status:
> Error enabled
> MCA:BUS Generic Generic Generic Other-transaction Request-timeout
> Error
> Model:Pad address glitch
>
> STATUS 9000080210800e0f MCGSTATUS 0
> MCE 1
> CPU 8 BANK 4 TSC 960e0d4f7e26
> MCG status:
> MCi status:
> Error enabled
> MCA:BUS Generic Generic Generic Other-transaction Request-timeout
> Error
> Model:Pad address glitch
>
> STATUS 9000080210800e0f MCGSTATUS 0
> MCE 0
> CPU 1 BANK 0 TSC 2105f9ef4c1fc
> MISC 14000240002a4 ADDR 208c54080
> MCG status:
> MCi status:
> Error overflow
> MCi_MISC register valid
> MCi_ADDR register valid
> MCA:Generic CACHE Level-1 Snoop Error
> STATUS cc00000120040189 MCGSTATUS 0
> MCE 1
> CPU 1 BANK 1 TSC 2105f9ef4ee2a
> MCG status:
> MCi status:
> MCA:Data CACHE Level-1 Data-Read Error
> STATUS 8000001800000135 MCGSTATUS 0
> /var/log/messages
> Sep 29 10:30:33 edw1 kernel: SMB connection re-established (-5)
> Sep 29 10:32:54 edw1 sshd(pam_unix)[19592]: session closed for user
> oracle
> Sep 29 11:04:51 edw1 sshd(pam_unix)[2505]: session opened for user
> root by (uid=0)
> Sep 29 11:05:22 edw1 observiced: observiced shutdown succeeded
> Sep 29 11:05:29 edw1 observiced: observiced startup succeeded
> Sep 29 11:05:57 edw1 sshd(pam_unix)[2505]: session closed for user
> root
> Sep 29 11:42:11 edw1 sshd(pam_unix)[17458]: session opened for user
> oracle by (uid=0)
>
> Sep 29 11:45:27 edw1 su(pam_unix)[18877]: session opened for user root
> by oracle(uid=200)
> Sep 29 11:45:28 edw1 observiced: observiced shutdown succeeded
> Sep 29 11:53:44 edw1 su(pam_unix)[18877]: session closed for user root
> Sep 29 11:53:49 edw1 kernel: SMB connection re-established (-5)
>
> Oct 5 04:02:03 edw1 kernel: SMB connection re-established (-5)
> Oct 5 04:02:05 edw1 snmpd[7107]: Received TERM or STOP signal...
> shutting down...
> Oct 5 04:02:05 edw1 snmpd: snmpd shutdown succeeded
> Oct 5 04:02:05 edw1 snmpd: snmpd startup succeeded
My goodness a whole lot of irrelevant information. <g>
Redhat 4 update?
Is this a new issue or one that existed before applying the 10.2.0.4 patch?
Have you verified kernel parameters per the Oracle install docs?
When does it shutdown?
What else is installed?
What is happening when it shuts down?
Are the shutdowns random or predictable?
Does the server have a remote management card?
Will it shut itself down if Oracle is not started?
Is there a core dump?
the more information you provide the better.
-- Daniel A. Morgan Oracle Ace Director & Instructor University of Washington damorgan_at_x.washington.edu (replace x with u to respond) Puget Sound Oracle Users Group www.psoug.orgReceived on Thu Oct 16 2008 - 22:43:56 CDT