Re: Severity 01

From: Tim Gorman <tim_at_evdbt.com>
Date: Fri, 04 Jun 2010 08:14:35 -0600
Message-ID: <4C090A4B.1070408_at_evdbt.com>




  


All,

Something to consider about "sev1" SRs with Oracle Support, having to do with simple human nature...

Oracle Support has multiple support centers for each area of expertise, scattered all around the world, resulting in a "follow the sun" ability to work a problem continuously, as responsibility for the Sev1 SR moves from time-zone to time-zone.  This continuous-processing capability is a good thing overall.

One possible problem with this "follow-the-sun" approach is that the SR is not necessarily worked by the same analyst every time it re-enters a time-zone.  Like any other workplace, Oracle Support is juggling employee schedules and issues.  So, as a result, the SR spends 6-8 hours with each analyst, and depending on the analyst, they may spend most of those 6-8 hours coming up to speed on the issues, then one or two rounds of Q&A to try to make progress, then pass the SR onto the next time-zone.  Thus, a sev1 SR can spend all of its time bouncing from one analyst to another, and as it ages, it becomes inevitable that new sev1 SRs enter the queue, distracting attention, and now the analyst has to juggle the older sev1 SRs long enough to pass onto the next time-zone.  Not that "juggling" is the intention of any of the analysts, but it is pretty much a natural consequence given the circumstances.

So, although your management may insist on pushing the SR to "severity 1", for a complex involved issue you may get better results from leaving it at severity 2, then escalating to a duty manager to make sure the best person for the problem is assigned to the SR.  It may seem counter-intuitive (or maybe not?), but 8 hours/day of one good analyst can achieve faster resolution than 24 hours/day of 3-4 different analysts every 6-8 hours.  Everyone knows that it requires constant input from the customer side in return for constant processing from the Oracle Support side, but that also includes continuous guidance from the customer side to ensure that progress is being made and "juggling" is not occurring.

Of course, you'd have to consider the nature of the problem carefully, but if it is a complex and involved situation, such as a recovery, this is something to consider...

Hope this helps...
Tim Gorman
consultant -> Evergreen Database Technologies, Inc.
postal     => P.O. Box 630791, Highlands Ranch CO  80163-0791
website    => http://www.EvDBT.com/
email      => Tim_at_EvDBT.com
mobile     => +1-303-885-4526
fax        => +1-303-484-3608
Lost Data? => http://www.ora600.be/ for info about DUDE...


Ozgur Ozdemircili wrote:
Hi all,

At last we were able to recover the database. Here are the details for anyone who,hopefully not, can run into any problem  like this:

- We have realized the problem at Friday just before the finish hour,
- The first reaction was shutdown the database as the instances (3 node RAC) was restarting with ORA-600 errors and Smon child process exited errors
- Invesitigating the problem we have found it was a logical corruption.
- As Smon was trying to recover the corrupted blocks (We had 15) it gave up after trying a number of times and restarted the instance.
- Oracle asked us to create a test environment and recover database just before the incident occured.(?)
- We have provided the details to the Oracle.

The solution is to Escalete!. Just after opening the SR, you re expected to call them to escalete the problem.

Thanks all..


Özgür Özdemircili
http://www.acikkod.org
Code so clean you could eat off it


On Sat, May 29, 2010 at 6:23 PM, Madhu Sreeram <madhusreeram_at_gmail.com> wrote:


On Fri, May 28, 2010 at 6:06 PM, Ozgur Ozdemircili <ozgur.ozdemircili_at_gmail.com> wrote:
Hi,

Well not good news. It seems one of our tables got corrupted, causing all RAC instances restart.We have opened a Severity 1 SR and waiting.

Please share your experiences on this:

-Service provider talks about a table getting corrupted and says that it causing the problem?Is it even possible ?

-How long does it take normally the Oracle technics to respond ?



Özgür Özdemircili
http://www.acikkod.org
Code so clean you could eat off it

It's possible. It's usually for logical corruptions, where SMON is trying to apply some recover and freaks out, crashing the instance. But you should see a ora-00600 or ora-7445 in the alert.log, it just can't happen silently.
We recently encountered the error ORA-600 [kddummy_blkchk], that caused instance crashes. Initially it seemed ok (happened about midnight), just one crash in two-three hours during batch loads, but as the morning work load started, the crash was almost every couple of minutes. This was on a 3node RAC. We did have a sev1 SR, but the support was disappointing. We put the tablespace in offline mode to get stability.We have someone responding, but so far none of their suggestions has worked. It's been about 3weeks , it's still unresolved. Still waiting on a patch.

On a side note, if you use the "allocate extent",  consider  applying the patch#6647480 or you could potentially cause corruptions.


-Madhu Sreeram.

-- http://www.freelists.org/webpage/oracle-l Received on Fri Jun 04 2010 - 09:14:35 CDT

Original text of this message