Re: Grid (RAC & Standalone) Unexpected Node Reboots Upon Device Path Failures

From: David Barbour <david.barbour1_at_gmail.com>
Date: Mon, 13 Jun 2016 14:06:23 -0500
Message-ID: <CAFH+iffkvbxXjYO_g6=o6VB-di5JbHomg6LDOfSB6O96J6BsfA_at_mail.gmail.com>



What errors messages are you getting in the <GRID_HOME>/log/<HOSTNAME>/agent/ohasd & crsd/orarootagent_root/orarootagent_root.logs?

We were (are) running into something similar and are evaluating the impact of implementing the multipath advice contained in DOC ID 2000037.1

On Mon, Jun 13, 2016 at 1:29 PM, Dimensional DBA < dimensional.dba_at_comcast.net> wrote:

> Other generic notes.
>
> Normally you don’t set “Queue_if_no_path “ but set “*no_path_retry* *N* “
>
>
>
> The number can vary but a standard setting for say EMC Symmtrix with UCS
> is
>
> no_path_retry 6
>
>
>
>
>
>
>
> *Matthew Parker*
>
> *Chief Technologist*
>
> *Dimensional DBA*
>
> *425-891-7934 <425-891-7934> (cell)*
>
> *D&B *047931344
>
> *CAGE *7J5S7
>
> *Dimensional.dba_at_comcast.net <Dimensional.dba_at_comcast.net>*
>
> *View Matthew Parker's profile on LinkedIn*
> <http://www.linkedin.com/pub/matthew-parker/6/51b/944/>
>
> www.dimensionaldba.com
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *Dimensional DBA
> *Sent:* Monday, June 13, 2016 10:52 AM
> *To:* fmhabash_at_gmail.com; 'Oracle-L Group'
> *Subject:* RE: Grid (RAC & Standalone) Unexpected Node Reboots Upon
> Device Path Failures
>
>
>
> Does it happen every time or sporadically?
>
> Can you provide an example lun from your multipath.conf and what values
> you are using for those settings or combination of those settings since
> some are binary opposites of each other?
>
> What UCS Manager version are you running and what firmware Bundle patch
> and on which blade type are you having problems with?
>
> Is the error in the cluster logs and OS logs that all paths timed out?
>
>
>
> There are a variety of failure points and each failure point had a
> different solution.
>
>
>
> That includes an administrator modifying templates in UCS manager causing
> the nodes to reboot.
>
>
>
>
>
> *Matthew Parker*
>
> *Chief Technologist*
>
> *Dimensional DBA*
>
> *425-891-7934 <425-891-7934> (cell)*
>
> *D&B *047931344
>
> *CAGE *7J5S7
>
> *Dimensional.dba_at_comcast.net <Dimensional.dba_at_comcast.net>*
>
> *View Matthew Parker's profile on LinkedIn*
> <http://www.linkedin.com/pub/matthew-parker/6/51b/944/>
>
> www.dimensionaldba.com
>
>
>
> *From:* oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *fmhabash_at_gmail.com
> *Sent:* Monday, June 13, 2016 10:05 AM
> *To:* Oracle-L Group
> *Subject:* Grid (RAC & Standalone) Unexpected Node Reboots Upon Device
> Path Failures
>
>
>
> We are experiences a perplexing issue that we have not been able to arrive
> at an RCA resolution. Grid nodes (can be RAC or standalone) boot
> unexpectedly & sporadically (not every time) when we failover a hardware
> component such as UCS fabric interconnect, an HBA, or a storage controller.
> On some systems, we also noticed filesystems going read-only.
>
>
>
> All devices are configured with multipathing of minim of 4 paths.
> Multipathing is offered via EMC PowerPath or Native Linux DM-MPIO.
>
>
>
> All nodes use 11gR2 ASM LVM, with subset using ASMLIB running on OEL
> 6.3-6.6 and RDBMS 11gR2
>
>
>
> I know there is a zillion factors to consider here, but to make things
> simple, let’s focus on dm-mpio for now. We believe, all these symptoms
> related to how the software (oracle ASM or Linux LVM) reacts to the loss
> of a path in a multipathed setup. So we focused on multipath.conf settings
> that control IO path failover. Namely …
>
>
>
> Path_retry
>
> Queue_if_no_path
>
> Polling_interval
>
> Rr_min_io
>
> Failback immediate
>
>
>
> 1) Have you experienced issues like unexpected node reboots,
> filesystems going read-only when failing over at the hardware level I
> listed above?
>
> 2) What was you resolution.
>
> 3) How does your multipath.conf parameters listed above compare to
> yours?
>
>
>
> Thanks all
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jun 13 2016 - 21:06:23 CEST

Original text of this message