Re: CRS-1615:voting device hang at 50% fatal, termination in 99620 ms

From: David Barbour <david.barbour1_at_gmail.com>
Date: Thu, 25 Aug 2011 17:14:42 -0500
Message-ID: <CAFH+iffX5qANz-J63jdriUdcQfFnK9YiSCGjNYY12vKokAGbrw_at_mail.gmail.com>



Anything in /var/log/messages?

On Thu, Aug 25, 2011 at 5:42 AM, Marko Sutic <marko.sutic_at_gmail.com> wrote:

> Freek,
>
> you are correct - heartbeat fatal messages are there due to the missing
> voting disk.
>
> I have another database up and running on second node and this database is
> using same ocfs2 volume for Oracle database files as the first one.
> This database is running without any error so I suppose that other OCFS2
> volumes were accessible in the time of the failure.
>
> In this configuration are 3 voting disk files located on 3 different luns
> and separate OCFS2 volumes. When failure occurs two of three voting devices
> hang.
>
> It is also worth to mention that nothing else is running on that node
> except import.
>
>
> I simply can't figure out why two of three voting disks hang.
>
>
> Regards,
> Marko
>
>
> On Thu, Aug 25, 2011 at 11:08 AM, D'Hooge Freek <Freek.DHooge_at_uptime.be>wrote:
>
>> Marco,
>>
>> I don't know the error timings for the other node, but I think the
>> heartbeat fatal messages are coming after the first node has terminated due
>> to the missing voting disk.
>>
>> This would indicate that there is no general problem with the voting disk
>> itself, but that the problem is specific to the first node.
>> Either the connection itself or the load or an ocfs2 bug would then be the
>> cause of the error.
>>
>> Do you know if at the time of the failure the other OCFS2 volumes where
>> still accessible?
>> Are your voting disks placed on the same luns as your database files or
>> are they on a separate ocfs2 volume?
>>
>> Regards,
>>
>>
>> Freek D'Hooge
>> Uptime
>> Oracle Database Administrator
>> email: freek.dhooge_at_uptime.be
>> tel +32(0)3 451 23 82
>> http://www.uptime.be
>> disclaimer: www.uptime.be/disclaimer
>> ---
>> From: Marko Sutic [mailto:marko.sutic_at_gmail.com]
>> Sent: donderdag 25 augustus 2011 10:51
>> To: D'Hooge Freek
>> Cc: oracle-l_at_freelists.org
>> Subject: Re: CRS-1615:voting device hang at 50% fatal, termination in
>> 99620 ms
>>
>> Errors messages from another node:
>>
>> 2011-08-25 10:38:33.563
>> [cssd(18117)]CRS-1612:node l01ora3 (1) at 50% heartbeat fatal, eviction in
>> 14.000 seconds
>> 2011-08-25 10:38:40.558
>> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in
>> 7.010 seconds
>> 2011-08-25 10:38:41.560
>> [cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in
>> 6.010 seconds
>> 2011-08-25 10:38:45.558
>> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
>> 2.010 seconds
>> 2011-08-25 10:38:46.560
>> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
>> 1.010 seconds
>> 2011-08-25 10:38:47.562
>> [cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in
>> 0.010 seconds
>> 2011-08-25 10:38:47.574
>> [cssd(18117)]CRS-1607:CSSD evicting node l01ora3. Details in
>> /u01/app/crs/log/l01ora4/cssd/ocssd.log.
>> 2011-08-25 10:39:01.579
>> [cssd(18117)]CRS-1601:CSSD Reconfiguration complete. Active nodes are
>> l01ora4 .
>>
>>
>> Regards,
>> Marko
>>
>
>
>
> --
> Marko Sutic, dipl.ing.rač.
> My LinkedIn Profile <http://hr.linkedin.com/in/markosutic>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Aug 25 2011 - 17:14:42 CDT

Original text of this message