Re: Veritas issue with 10G RAC

From: Alex Gorbachev <>
Date: Thu, 8 Jun 2006 20:51:33 +0200
Message-ID: <>

We were on HP-UX platform (3 large nodes) and had issues with Veritas IO that was "hanging" for a while. We have since identified the root cause of VxVM problems.
Now, outage of the whole cluster was caused by hanging IO on voting disk (there is voting disk timeout derived from misscount - in our case as misscount-15). To address this there is one-off patch (we were on but I think patchset didn't have it included either) that allows you define disk timeout independently of misscount. This one-off is included in patchset, AFAIK. This is not platform specific issue. Check note 294430.1.

Some benefits of chopping up one huge SMP that Kevin asked for: --- manageability - you can shutdown part of your system down for maintenance. However, 6 domains might be somewhat excessive. --- local failure tolerance. Well, don't know if isolated hardware issues on single domain happen often but software is not ideal anyway so nodes do go down. On the other hand, you have much higher chances hitting the problem specific to clustered environment. --- limitations in scalability of singe SMP machine. Perhaps, this is a bit duff since SMP scales better until certain number of CPUs (someone please correct me if I am wrong). Until "certain" number of CPUs - is it the max you can have in single SMP? :-)

2006/6/8, fairlie rego <>:
> Hi all,
> Environment
> =========
> 6 node 10G RAC on Solaris 2.9 E25K domains with patchset with Veritas SFRAC 4.1 and Vxfs
> We've had the following problem on 2 nodes where we lose
> the entire Veritas file system across the cluster after doing the following rendering the whole cluster being unavailable.....

Best regards,
Alex Gorbachev
Received on Thu Jun 08 2006 - 13:51:33 CDT

