RE: OS upgrade for RAC

From: Hameed, Amir <Amir.Hameed_at_xerox.com>
Date: Wed, 3 Jun 2015 18:46:10 +0000
Message-ID: <AF02C941134B1A4AB5F61A726D08DCED1FDBCA7C_at_USA7109MB012.na.xerox.net>



Thanks Riyaj for your feedback. It was very insightful. A few follow up questions:   (c) Database files in the NFS file system: NFS client software might have a bit more optimization in the later release, so, having the cluster nodes with different NFS client may cause issues, again, this is untested from the support point of view. If you don't use NFS files for the database, ignore this point ( Just to be clear, I am not against NFS, and I do agree that direct NFS is highly performant. Having different NFS client in different nodes, for the same database, is probably not certified). Yes, we are using dNFS in this RAC environment. The RAC and Grid stacks are also mounted via kNFS. However, each RAC node has its own Grid and RDBMS homes and they are not being shared b/w nodes. Since in a dNFS configuration, kNFS is used only to mount the DB file systems, do you believe that, in theory, having a slightly different version of kNFS client with separate grid and RDBMS home directories could cause issues?

  (d) Different OS release inevitably bring different firmware release too. So, quite possibly, different firmware is not certified by the hardware layer vendors too. These servers are patched with OS patches on quarterly basis so I am assuming that the firmware is quite recent on all of these servers.

Thanks
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Riyaj Shamsudeen Sent: Wednesday, June 03, 2015 12:19 PM To: Hameed, Amir
Cc: mark_at_bobak.net; djeday84_at_gmail.com; oracle-l_at_freelists.org Subject: Re: OS upgrade for RAC

Hi Amir,
  I am guessing here: Seemingly arbitrary (as Mark correctly pointed out), insistence of1-2 day to complete rolling upgrade is quite possibly triggered because of the following concerns:

(a) Support may not have tested this configuration and so, they are being paranoid to give you a better answer.

(b) They are concerned about OS bug fixes. For example, in olden days, there was an NTP bug which was fixed in a later release. So, the upgraded servers were catching up time fast, but not-yet-upgraded servers were not catching up that fast and the servers were drifting slowly from the mean cluster time. We (accidentally) noticed that in the GI or CSSD log and had to add a manual fix to correct time. Had we left the servers drift in time, they would have restarted eventually, as this leads to a condition similar to the missing heartbeat issue, but, that was many years ago. So, 1-2 day may have come from such bad experience.

  (c) Database files in the NFS file system: NFS client software might have a bit more optimization in the later release, so, having the cluster nodes with different NFS client may cause issues, again, this is untested from the support point of view. If you don't use NFS files for the database, ignore this point ( Just to be clear, I am not against NFS, and I do agree that direct NFS is highly performant. Having different NFS client in different nodes, for the same database, is probably not certified).

  (d) Different OS release inevitably bring different firmware release too. So, quite possibly, different firmware is not certified by the hardware layer vendors too.

   Setting aside above items, let's review the communication between the nodes:

(1) Network heartbeat and other RAC messages flow through UDP protocol. I doubt that this upgrade will change any functionality at that low level layer. So, having two different OS versions should be fine.

(2) Cluster nodes should not drift away too far from the mean cluster time. I am almost positive that you use NTP and that is a rock solid product at this time. CTSSD is an Oracle product. So, in this case also, having two different OS versions should be fine.

(3) CSSD based disk heartbeat also should not cause issues, as it is an Oracle binary. Further, I doubt that there will be a big functional change in the low I/O layer too.

   If I were you, this is what I would recommend as a plan:    Say , you have n nodes in the cluster, then:

    (i) Upgrade first node, bring that in to the cluster, test it for a day or two.
    (ii) Upgrade ~half the nodes. test for a day.
    (ii) upgrade all other remaining nodes.

  This plan should reduce the concerns. However, I don't know whether this approach is feasible or not..

Cheers

Riyaj Shamsudeen
Principal DBA,
Ora!nternals - http://www.orainternals.com<http://www.orainternals.com/> - Specialists in Performance, RAC and EBS Blog: http://orainternals.wordpress.com/ Oracle ACE Director and OakTable member<http://www.oaktable.com/>

Co-author of the books: Expert Oracle Practices<http://tinyurl.com/book-expert-oracle-practices/>, Pro Oracle SQL, <http://tinyurl.com/ahpvms8> Expert RAC Practices 12c.<http://tinyurl.com/expert-rac-12c> Expert PL/SQL practices<http://tinyurl.com/book-expert-plsql-practices>

On Tue, Jun 2, 2015 at 9:04 AM, Hameed, Amir <Amir.Hameed_at_xerox.com<mailto:Amir.Hameed_at_xerox.com>> wrote: Thanks Mark.
What you have stated is exactly what I was also wondering, especially when it is a rev. upgrade and not a Sol10-to-Sol11 upgrade.

From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [mailto:oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] On Behalf Of Mark J. Bobak Sent: Tuesday, June 02, 2015 11:56 AM
To: djeday84_at_gmail.com<mailto:djeday84_at_gmail.com>; oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org> Subject: Re: OS upgrade for RAC

1-2 days seems pretty arbitrary to me. What, exactly, will break on day 3, that didn't break on day 1 or day 2?

Obviously, probably shouldn't run that way indefinitely, but doing one node per weekend for 4 weekends seems reasonable to me.

-Mark

On Tue, Jun 2, 2015, 11:20 AM Anton <djeday84_at_gmail.com<mailto:djeday84_at_gmail.com>> wrote: it is linux, but because of bugs we had to work with

[root_at_pk7db01 ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 5.11 (Tikanga) Kernel \r on an \m

[oracle_at_pk7db02 /home/oracle]$cat /etc/issue Red Hat Enterprise Linux Server release 5.8 (Tikanga) Kernel \r on an \m

seems to work fine, at least this week.

On 06/02/2015 05:05 PM, Hameed, Amir wrote: We are running a four-node RAC (both Grid and RDBMS are 11.2.0.4) on Solaris 10 update 10. We need to upgrade our database and grid to 12c and that requires the OS to be at a minimum of Solaris 10 Update 11. What I would like to find out is that if we do a rolling OS upgrade where we upgrade one RAC node at a time from Solaris10/Update10 to Solaris10/Update11, for how long can these RAC nodes stay out of synch in terms of the OS revision level? Is it possible to upgrade OS on one node every week and spread the entire process over four weeks? Oracle’s response was that the maximum these hosts can stay out of synch is 1-2 days but I would like to validate it from the list.

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jun 03 2015 - 20:46:10 CEST

Original text of this message