Re: Upgrade from 9.2 to 10.2 (NON-Rac)

From: Mark Strickland <strickland.mark_at_gmail.com>
Date: Tue, 17 Oct 2006 11:51:03 -0700
Message-ID: <90ad14210610171151y17dddd03g4cf1e3821841de93@mail.gmail.com>

I can't address the 9.2 to 10.2 upgrade steps other than to say TEST TEST TEST and document the steps carefully down to the keystroke. What I CAN talk about is the Upgrade From Hell that I and my co-DBA did this past weekend. We upgraded production from 10.1.0.3 to 10.1.0.5. Just a patchset, so a rather minor upgrade, really. In theory, at least. This is on Solaris 9, Veritas, Hitachi SAN, 3-node RAC, Data Guard with physical and logical standbys, and an RMAN catalog. We had carefully tested and documented everything very carefully and were expecting a 3-hour cakewalk (but ready for anything, of course). Well. It took from noon Saturday until 9:00 Sunday night to get stable again. The upgrade itself took close to 4 hours. However, for some so-far inexplicable reason, Oracle decided to switch the VIPs to a different network interface on each RAC server. We re-booted the 3 servers, then Veritas couldn't mount all the file systems. My co-DBA knows Veritas well and got that cleaned up and after another re-boot, the servers couldn't NFS-mount the file system that is used for DB_FILE_RECOVERY_DEST. That required a static-IP fix from our network engineer. So, once we got everything restarted, the instances started crashing after 20-40 minutes. The rest of the weekend was spent on the phone with Oracle Support. We went through three staff shifts at Oracle Support and each handoff required the support engineer going through the logs and trace files and getting up to speed on the issue. We were about to punt and switch to single-node and turn on more CPUs when an engineer in either India or Australia (can't remember which...the engineer is Sandeep Singla...BRILLIANT!) was able to identify the cause of the problem in the VIP trace file. It was occasionally timing out while checking the default gateway. The timeout threshold was 2 seconds and the engineer had us change that to 10. The timeouts were causing the instances on the node to crash. After 36 hours with 1 hour of sleep on a company sofa for each of us and working with three shifts at Oracle Support, our Production environment was stable again, just in the nick of time. I'm almost caught up on sleep and I'm starting to unclench.

As you might guess, I'm now even more motivated to understand RAC inside and out.

Other than using this forum to b***h about our upgrade experience, I hope to have provided useful information.

Vivek, if you'd like a copy of our upgrade plan, I'd be happy to send it. It won't be directly applicable to your upgrade, of course, but it might be useful.

Regards,
Mark Strickland
Seattle, WA

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Oct 17 2006 - 13:51:03 CDT

Re: Upgrade from 9.2 to 10.2 (NON-Rac) - Basic Steps