Re: avoiding corrupted block (ora-01578)

From: S. Tran <sam_at_home>
Date: Tue, 8 Jun 1999 21:41:46 GMT
Message-ID: <375D8E1A.4B6ACC@home>

Once upon a time, in my previous life, I encountered the same nightmare that you have. So, in order to have a better life next time, I should share
this knowledge or my conscience will condemn me forever. Now, please deposit
$1 million to my bank acount#........ :-P Seriously:
According to Oracle, they recommended that you should run ANALYZE statement
TWICE on the object that gave the error (Note: NEVER run the analyze on ANY
SYS's object.Period!), then review the results with 3 diff. scenarios:

If after ANALYZE runs twice your receive an ORA-1578 with the same arguments there is a corrupt physical block. The data within the corrupt block(s) is lost and the corrupted object(s) need to be recovered.
If after ANALYZE runs twice your receive an ORA-1578 with different arguments each time This is more likely to be a hardware problem. Check for I/O problems, and/or controller problems, and outdated drivers. Run hardware diagnostics.
If there are no errors after running the analyze command twice This could be hardware problems - monitor the system carefully. Call World Wide Customer Support.

In addition, you may want to enable CHECKSUM for additional tracing and detection of
block corruption (available on release 7.2 or higher). Every time that a block is written
to disk, Oracle calculates the CHECKSUM and stores it in the data block header. The next
time the block is read, the checksum is validated to detect corruption. This operation,
decreases the database's performance from 10% to 50%, so better try this at night when
most of your users were at slzzzzzp.

To enable block CHECKSUM, follow these steps:

Shutdown the instance

CONNECT INTERNAL
SHUTDOWN or SHUTDOWN IMMEDIATE

2. Edit the init%SID%.ora and add/change the following lines:

LOG_BLOCK_CHECKSUM = TRUE
DB_BLOCK_CHECKSUM = TRUE Note: The COMPATIBLE parameter in the init%SID%.ora file should be set to 7.2.0 or higher for checksum to work.

3. Shutdown the instance

CONNECT INTERNAL
STARTUP B. Rollback Segments
If there is corruption in a rollback segment, it may be caused by hardware problems.
Block CHECKSUM could be set for additional tracing information (see Temporary
Segments). In addition, run hardware diagnostics. Recovery of the rollback
segment datafile/tablespace may be required. Recovery strategy will depend on
whether archiving is enabled.

C. Index Segments

Drop and recreate.

D. Cluster Segments

Call Worldwide Customer Support

E. Table Segments

Get any trace files that might have been created.

Best of luck, trust me, you'll need it.

Nathan Phan wrote:
>
> Hi people,
> We are new (6 months) to oracle database and get hit by data block
> corrupted 5 times over 4 instances of databases. ( should we hire
> another dba who have a better luck ? ;) One of the interview question
> will be when is your last encounter of corrupted block ...) Oracle
> support fax us the steps on how to recover to best as it could, we
> framed the pages on the wall for easy information access and fear that
> the paper will get corrupted also.
>
> Let's get serious.
>
> We can't live on like this any more, too much resources and people
> effort ( from operators to sysadms to dbas to developers to help desks
> to end users ) involve to do recovery. Before we finish one, the other
> one pop up. We definately need your advices, suggestions. Thanks in
> advance !
>
> The REAL questions I want to ask are,
> 1. How can we scan any potential bad block before writing into it
> without too much overhead.
> 2. We are using IBM SSA raid 5, is this a bad choice ? No error
> report from ssa. After getting a corrupted block, we still able to
> " cp /.../.../data.dbf /tmp" ( data.dbf contain the table space which
> contain the corrupted tabe ), can I conclude that nothing wrong on the
> file from os point of view, nothing wrong with the hardware, the
> corruption is in the content of the file. If the hardware and the os is
> not causing the problem, why can't Oracle solve the problem ?
> 3. Under what circumferances that there is a high chance to get this
> problem ? too many extent ? too big, too small db_block size ? too many
> user doing read write for too long ?
> 4. for people who had encounter similar problem,
> what platform,
> os,
> type of disk,
> type of disk system,
> oracle version
> are u on ? Just want to find out is there any consistent pattern.
>
> Oh, I almost forgot, we are on AIX 431, oracle 733, SP high node,
> SSA raid5, SSA hard disk.
>
> I am sorry to have so many "?" mark in my posting, I am desperate
> and it also reflect the situation we are in now. Once again, thanks in
> advance !
>
> Regards
> Nathan Phan
>
> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.
Received on Tue Jun 08 1999 - 16:41:46 CDT