Re: CRS-1615:voting device hang at 50% fatal, termination in 99620 ms

From: Marko Sutic <marko.sutic_at_gmail.com>
Date: Thu, 25 Aug 2011 10:51:00 +0200
Message-ID: <CAMD6WPdg_r49BJJZAySsmAX+qNkB5JBCOYuidrapE7cVmn=o5w_at_mail.gmail.com>



Errors messages from another node:

2011-08-25 10:38:33.563
[cssd(18117)]CRS-1612:node l01ora3 (1) at 50% heartbeat fatal, eviction in 14.000 seconds
2011-08-25 10:38:40.558
[cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in 7.010 seconds
2011-08-25 10:38:41.560
[cssd(18117)]CRS-1611:node l01ora3 (1) at 75% heartbeat fatal, eviction in 6.010 seconds
2011-08-25 10:38:45.558
[cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in 2.010 seconds
2011-08-25 10:38:46.560
[cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in 1.010 seconds
2011-08-25 10:38:47.562
[cssd(18117)]CRS-1610:node l01ora3 (1) at 90% heartbeat fatal, eviction in 0.010 seconds
2011-08-25 10:38:47.574
[cssd(18117)]CRS-1607:CSSD evicting node l01ora3. Details in /u01/app/crs/log/l01ora4/cssd/ocssd.log. 2011-08-25 10:39:01.579
[cssd(18117)]CRS-1601:CSSD Reconfiguration complete. Active nodes are l01ora4 .

Regards,
Marko

On Thu, Aug 25, 2011 at 10:48 AM, Marko Sutic <marko.sutic_at_gmail.com> wrote:

> Statistics during import:
>
> [root_at_l01ora3 ~]# sar -u 2 15
> Linux 2.6.18-238.el5 (l01ora3.ot.hr) 08/25/2011
>
> 10:36:55 AM CPU %user %nice %system %iowait %steal
> %idle
> 10:36:57 AM all 5.59 0.00 0.06 3.12 0.00
> 91.22
> 10:36:59 AM all 1.84 0.00 0.09 4.43 0.00
> 93.63
> 10:37:01 AM all 1.50 0.00 0.09 4.81 0.00
> 93.60
> 10:37:03 AM all 1.44 0.00 0.09 4.78 0.00
> 93.69
> 10:37:05 AM all 1.47 0.00 0.41 4.87 0.00
> 93.25
> 10:37:07 AM all 1.47 0.00 0.06 4.75 0.00
> 93.72
> 10:37:09 AM all 1.22 0.00 0.09 5.18 0.00
> 93.51
> 10:37:11 AM all 0.22 0.00 0.03 6.15 0.00
> 93.60
> 10:37:13 AM all 0.28 0.00 0.06 8.72 0.00
> 90.94
> 10:37:15 AM all 1.53 0.00 0.19 4.93 0.00
> 93.35
> 10:37:17 AM all 1.47 0.00 0.09 4.72 0.00
> 93.72
> 10:37:19 AM all 6.28 0.00 0.06 0.00 0.00
> 93.66
> 10:37:21 AM all 0.31 0.00 0.03 6.03 0.00
> 93.63
> 10:37:23 AM all 0.00 0.00 0.03 11.31 0.00
> 88.66
> 10:37:25 AM all 0.06 0.00 0.06 12.48 0.00
> 87.39
> Average: all 1.65 0.00 0.10 5.75 0.00
> 92.50
>
>
> (I've excluded inactive devices from output)
> $ iostat -xd 5 3
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 0.00 21.40 0.00 20.80 0.00 337.60 16.23
> 3.47 167.02 3.59 7.46
> sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00
> sda2 0.00 21.40 0.00 20.80 0.00 337.60 16.23
> 3.47 167.02 3.59 7.46
> sdb 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.17 1.17 0.14
> sdb1 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.17 1.17 0.14
> sdc 0.00 0.00 23.80 0.20 24371.20 6.40 1015.73
> 0.17 7.28 4.46 10.70
> sdc1 0.00 0.00 23.80 0.20 24371.20 6.40 1015.73
> 0.17 7.28 4.46 10.70
> sde 0.00 0.00 0.40 0.60 0.80 0.80 1.60
> 0.00 1.40 1.40 0.14
> sde1 0.00 0.00 0.40 0.60 0.80 0.80 1.60
> 0.00 1.40 1.40 0.14
> sdo 0.00 0.00 22.80 0.00 23347.20 0.00 1024.00
> 0.17 7.36 4.46 10.16
> sdo1 0.00 0.00 22.80 0.00 23347.20 0.00 1024.00
> 0.17 7.36 4.46 10.16
> sdt 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.50 1.50 0.18
> sdt1 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.50 1.50 0.18
> sdu 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> sdu1 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> sdx 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.33 1.33 0.16
> sdx1 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.33 1.33 0.16
> sdaa 0.00 0.00 20.40 0.60 20691.20 19.20 986.21
> 0.14 6.56 4.00 8.40
> sdaa1 0.00 0.00 20.40 0.60 20691.20 19.20 986.21
> 0.14 6.56 4.00 8.40
> sdai 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> sdai1 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> sdam 0.00 0.00 23.60 0.00 24166.40 0.00 1024.00
> 0.17 7.22 4.45 10.50
> sdam1 0.00 0.00 23.60 0.00 24166.40 0.00 1024.00
> 0.17 7.22 4.45 10.50
> sday 0.00 0.00 24.00 0.00 24576.00 0.00 1024.00
> 0.17 7.19 4.43 10.64
> sday1 0.00 0.00 24.00 0.00 24576.00 0.00 1024.00
> 0.17 7.19 4.43 10.64
> sdaz 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.04
> sdaz1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.04
> sdbb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.04
> sdbb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.04
> sdbi 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> sdbi1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> sdbk 0.00 0.00 24.00 0.00 24576.00 0.00 1024.00
> 1.16 6.85 41.68 100.02
> sdbk1 0.00 0.00 24.00 0.00 24576.00 0.00 1024.00
> 1.16 6.85 41.68 100.02
> sdbr 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 2.00 0.00 0.00 100.02
> sdbr1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 2.00 0.00 0.00 100.02
> sdbw 0.00 0.00 24.00 0.00 24576.00 0.00 1024.00
> 0.16 6.78 4.24 10.18
> sdbw1 0.00 0.00 24.00 0.00 24576.00 0.00 1024.00
> 0.16 6.78 4.24 10.18
> sdcd 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> sdcd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> sdci 0.00 0.00 23.60 0.40 23968.00 12.80 999.20
> 0.17 7.14 4.38 10.52
> sdci1 0.00 0.00 23.60 0.40 23968.00 12.80 999.20
> 0.17 7.14 4.38 10.52
> sdcm 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3.00 0.00 0.00 100.02
> sdcm1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3.00 0.00 0.00 100.02
> dm-0 0.00 0.00 0.00 42.20 0.00 337.60 8.00
> 4.93 116.84 1.77 7.46
> dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> dm-3 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.33 1.33 0.16
> dm-4 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3.00 0.00 0.00 100.02
> dm-6 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> dm-7 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.50 1.50 0.18
> dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3.00 0.00 0.00 100.02
> dm-9 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> dm-10 0.00 0.00 0.60 0.80 1.20 1.00 1.57
> 0.00 1.29 1.29 0.18
> dm-11 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> dm-12 0.00 0.00 186.40 1.20 190476.80 38.40 1015.54
> 2.32 7.04 5.33 100.02
> dm-13 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.17 1.17 0.14
> dm-14 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3.00 0.00 0.00 100.02
> dm-15 0.00 0.00 186.40 1.20 190476.80 38.40 1015.54
> 2.32 7.04 5.33 100.02
> dm-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3.00 0.00 0.00 100.02
> dm-17 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.17 1.17 0.14
> dm-18 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.33 1.33 0.16
> dm-19 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 1.50 1.50 0.18
> dm-20 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> dm-21 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> dm-22 0.00 0.00 0.60 0.60 1.20 0.60 1.50
> 0.00 0.67 0.67 0.08
> dm-23 0.00 0.00 0.60 0.80 1.20 1.00 1.57
> 0.00 1.29 1.29 0.18
> dm-24 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
> dm-25 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 1.00 0.00 0.00 100.02
>
>
> Hm... disk utilization is 100% for several devices.
>
> "dm-12" and "dm-12" are devices with database files.
>
>
> FC:
> 24:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI
> Express HBA (rev 02)
> 24:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI
> Express HBA (rev 02)
>
>
> Regards,
> Marko
>
>
> 2011/8/25 D'Hooge Freek <Freek.DHooge_at_uptime.be>
>
>> Marco,
>>
>> Did you notice a high io wait during the import?
>> Do you see the same error message on the other node (on which the import
>> is not running)?
>>
>>
>> Freek D'Hooge
>> Uptime
>> Oracle Database Administrator
>> email: freek.dhooge_at_uptime.be
>> tel +32(0)3 451 23 82
>> http://www.uptime.be
>> disclaimer: www.uptime.be/disclaimer
>> ----
>> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
>> On Behalf Of Marko Sutic
>> Sent: donderdag 25 augustus 2011 9:08
>> To: oracle-l_at_freelists.org
>> Subject: CRS-1615:voting device hang at 50% fatal, termination in 99620 ms
>>
>> Hello all,
>>
>> I have a problem with diagnosing "CRS-1615 voting device hang" error.
>>
>> OS: OEL 5.6 (64-bit)
>> DB: Oracle 11.1.0.7
>> Storage: IBM XIV
>> HW: Blade server 8 cores with Hyper-Threading enabled (64 GB RAM).
>>
>> This is active/passive failover cluster with OCFS2 used as cluster file
>> system.
>> OCR and VotingDisks are placed on OCFS2 mounts.
>>
>>
>> Problem occurs during "impdp" import of 700GB database from NFS.
>> First part of an import, loading data, is done without any problem.
>>
>> During an index creation, after hour or two, node restarts due to the
>> problem with availability of voting disks:
>> "[cssd(28868)]CRS-1615:voting device hang at 50% fatal, termination in
>> 99620 ms, disk (0//ocfs2/voting1/votingdisk)"
>>
>> Impdp parameter file:
>> --
>> userid=xxx/yyy
>> directory=export_dir
>> dumpfile=exportdb_%U.dmp
>> logfile=import.log
>> parallel=16
>> exclude=statistics
>> schemas=(..list of shemas...)
>> exclude=DB_LINK
>> --
>>
>> Database is in noarchivelog mode with "memory_target=7621050368" parameter
>> specified.
>> I've noticed that utilization of the server is not significant during
>> import.
>>
>> As this error is reproducible I'm trying to find how to efficiently
>> diagnose this problem and to trace the cause.
>>
>>
>> If you have any suggestions I would appreciate any help.
>>
>>
>> Regards,
>> Marko
>>
>> --
>> Marko Sutic, dipl.ing.rač.
>> My LinkedIn Profile
>>
>
>
>
> --
> Marko Sutic, dipl.ing.rač.
> My LinkedIn Profile <http://hr.linkedin.com/in/markosutic>
>
>

-- 
Marko Sutic, dipl.ing.rač.
My LinkedIn Profile <http://hr.linkedin.com/in/markosutic>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Aug 25 2011 - 03:51:00 CDT

Original text of this message