Re: DB recovery 'opportunity'

From: Ed Stevens <ed.stevens_at_nospam.noway>
Date: Fri, 29 Jul 2005 18:33:00 -0500
Message-ID: <p0fle1dd0n3dv58sdirm7hjihdf0id6jqu@4ax.com>

On Fri, 29 Jul 2005 22:47:40 +0200, "Matthias Hoys" <idmwarpzone_NOSPAM__at_yahoo.com> wrote:

>
>"Ed Stevens" <ed.stevens_at_comcast.net> wrote in message
>news:1122665840.276869.108640_at_g47g2000cwa.googlegroups.com...
>> DA Morgan wrote:
>>> Ed Stevens wrote:
>>> > Platform - Oracle 8.1.7 on Win2k server
>>> >
>>> > Disclaimer: I was called in to pull this db out of the fire. I have
>>> > never seen this db before, and had no hand in its setup or current
>>> > condition. The guy that normally covers this db is unavailable.
>>> >
>>> > App and DB starting reporting problems on Jul 22. I was called in on
>>> > July 28.
>>> > Startup of db fails. Here are the last few lines from the alert log:
>>> >
>>> > Completed: ALTER DATABASE MOUNT
>>> > Thu Jul 28 14:33:18 2005
>>> > ALTER DATABASE OPEN
>>> > ARCH: Beginning to archive log# 3 seq# 831
>>> > Thu Jul 28 14:45:10 2005
>>> > ARCH: I/O error 19502 archiving log 3 to 'E:\ORAARCH\XVLP\ARCH_831.ARC'
>>> > ARCH: Archiving not possible: error count exceeded
>>> > ARCH: Failed to archive log# 3 seq# 831
>>> > ORA-16038 signalled during: ALTER DATABASE OPEN...
>>> >
>>> > The first thing I checked was available disk space at the archive
>>> > destination. There were several dozen gig available. All web serches
>>> > (MetaLink, this ng, AskTom, Google ...) keep pointing to disk full
>>> > conditions. We do know that the server admins have been monkying with
>>> > the disks, which are in a SAN unit. We have gotten little info from
>>> > them ... they (and the server) are located in Mexico, and their English
>>> > is little better than our Spanish.
>>> >
>>> > Further tidbits:
>>> >
>>> > There is very little alert log history available. There are scripts on
>>> > the server for stopping and starting the DB's (two of them) and part of
>>> > the shutdown renames the alert log to a backup, keeping only three
>>> > generations. Unfortunately, this was done 3 times in one day -- after
>>> > the problems began -- so any info on what led into the current
>>> > situation has been lost.
>>> >
>>> > On the day the problems began the orginal DBA, for reasons unknown,
>>> > modified the init.ora file, removing references to the 2d and 3d
>>> > control files. Those files still exist, but of course are out of sync
>>> > with the one remaining active file.
>>> >
>>> > Now, for the real kicker ... there are no backups ......
>>> >
>>> > Fortunately, the way this app shares data with the mainframe, we *CAN*
>>> > recover by recreating the db from scratch and having the app issue a
>>> > request to reload from the mainframe. But as an educational exercise,
>>> > I'd like to explore other possibilities -- just in case I find myself
>>> > in a similar situation with an app that doesn't so easily recover
>>> > itself.
>>> >
>>> > The course of action that seems best is to:
>>> > 1 - stop the db
>>> > 2 - copy the one active control file over the two old ones (with
>>> > corresponding renaming)
>>> > 3 - re-instate the control file references in init.ora
>>> > 4 - startup nomount
>>> > 5 - open resetlogs
>>> >
>>> > What say the jury?
>>> >
>>> > Yes, I've modified the shutdown script to keep much more alert.log
>>> > history, and will be addressing the lack of backup ...
>>>
>>> I say what in the alert log that you showed us makes you think a control
>>> file has anything to do with the problem?
>>>
>>> I say a quick trip to metalink with the error messages will quickly
>>> reveal a solution.
>>>
>>> Based on your disclaimer I am not in the market to give you a solution
>>> given what appears to be your lack of experience and concerns that 'the
>>> solution' might make things worse than they already are.
>>> --
>>> Daniel A. Morgan
>>> http://www.psoug.org
>>> damorgan_at_x.washington.edu
>>> (replace x with u to respond)
>>
>> Daniel,
>>
>> Well, nothing in the alert log made *me* think the control file had
>> anything to do with the i/o error and failure to start. I can't answer
>> for the guy who made that change to the init.ora file. I simply
>> brought it up as another irregularity in a bad situation and another
>> thing that also needs to be fixed.
>>
>> Searches of MetaLink (and other sites listed in my original) keep
>> pointing to being out of disk space as the cause, and the cure to
>> simply create (by whatever means) more space, and the db will
>> self-recover. I have certainly seen this many times when an archive
>> destination filled up, but in this case, we have plenty of space.
>> However, at this point I'm assuming the same applies: fix the disk
>> problem (whatever it is) and the db will self-recover.
>>
>> And we definately have a more fundamental disk problem. Further
>> investigation finds the Windows event log flooded with msgs about
>> writes to the page file timing out. And as an experiment, I went to
>> the archive destination directory and tried to simply copy one of the
>> older archivelog files (only 10 mb in size) to 'dummy.log'. An hour
>> later, when it had not finished, and all other tasks on the server
>> seemed to be grinding to a halt, I gave up and killed my remote
>> session.
>>
>> So at this point, we've tossed it back to the server and storage admins
>> and informed all concerned that we can't do any more until the server
>> itself, and its disk problems are stabelized. When that is done I'll
>> try a simple restart of the DB and see what happens, then go from there.
>>
>
>Could be a problem with the HBA, the fiber cable, incorrect HBA Windows
>drivers or config, ... Work for the storage admins ! Which SAN are you using
>? Windows 2000 is not really SAN friendly, Windows 2003 is much better ...
>
>
>Matthias
>

Those are all questions I can't answer. If it were any of our other Windoze boxes I'd just walk down the hall and chat with the SA. But then, if it were one of *their* boxes, we'd never have been in this situation! ;-) Received on Fri Jul 29 2005 - 18:33:00 CDT

Re: DB recovery 'opportunity' - not urgent