Re: ORA-01578: Datablock corruption

From: koert54 <nospam_at_spam.com>
Date: Fri, 24 Jan 2003 13:30:47 +0100
Message-ID: <3e3131c9$0$227$4d4efb8e@news.be.uu.net>

I would also recommend upgrading your DB to at least 8.0.6. Releases 8.0.4 & 8.0.5 were quite crappy - I've had a lot of corrupted blocks especially on 8.0.4 / AIX due to multiple dbwriters & async I/O. Furthermore these releases are not supported anymore by Oracle - a valid reason to 'force' your customer into upgrading ...

"Pascal Byrne" <byrne_at_icada.com> wrote in message news:b0qtel$s3l0l$1_at_ID-147008.news.dfncis.de...
> This is an interesting article. The trace events outlined may prove
useful.
>
> Thanks,
> pascal
>
>
> koert54 wrote:
> > From metalink :
> >
> >
> >
> > PURPOSE
> >
> > This article discusses block corruptions in Oracle and how they are
related
> >
> > to the underlying operating system and hardware. To better illustrate
the
> >
> > discussion, Unix is taken as the operating system of reference.
> >
> > SCOPE & APPLICATION
> >
> > For users requiring further understanding as to how a block could become
> >
> > corrupted.
> >
> >
> >
> > Block corruption has been a common occurrence on most UNIX based systems
and
> >
> > relational databases for many years. It is one of the most frequent ways
to
> >
> > lose data and cause serious business impact. Through a survey of
literary
> >
> > technical sources, this document will discuss several ways that block
> >
> > corruptions can occur, provide conclusions and possible solutions.
> >
> > To fully comprehend all the reasons for block corruption's, it is
necessary
> > to
> >
> > understand how I/O device subsystems work, how memory buffers are used
to
> >
> > support the reading and writing of data blocks, how blocks are sized on
both
> >
> > UNIX and Oracle, and how these three objects work together to maintain
data
> >
> > consistency.
> >
> > I/O devices are designed specifically for host machines and there have
been
> >
> > few attempts to standardize a particular interface across the industry.
Most
> >
> > software, including Oracle, on UNIX machines uses standard C program
calls
> > that
> >
> > in turn perform system calls to support the reading and writing of data
to
> >
> > disk. These system calls access I/O device software that retrieves or
writes
> >
> > data on disk.
> >
> > The UNIX system contains two types of devices, block devices and raw or
> >
> > character devices. Block devices look like random access storage devices
to
> >
> > the rest of the system while character devices include all other devices
> > such
> >
> > as terminals and network media. (Bach, 1990 314). These device types are
> >
> > important to understand because different combinations can increase
> > corruptions.
> >
> > Device drivers are configured by the operating system and the
configuration
> >
> > procedure generates or populates tables that form part of the code of
the
> >
> > kernel. This kernel to device driver interface is described by the block
> >
> > device switch table and the character device switch table. Each device
type
> >
> > has entries in these tables that direct the kernel to the appropriate
driver
> >
> > interfaces for the system calls. The open and close system calls of a
device
> >
> > file funnel through the two device switch tables, according to file
type.
> > The
> >
> > mount and umount system calls also invoke the device open and close
> > procedures
> >
> > for block devices. Read and write system calls of character special
files
> > pass
> >
> > through the respective procedures in the character device switch tables.
> > Read
> >
> > and write system calls of block devices and of files on mounted file
systems
> >
> > invokes the algorithms of the buffer cache, which invoke the device
strategy
> >
> > procedure. (Bach, 1990 314). This buffer cache plays an important role
in
> >
> > block corruptions since it is the location where data blocks are the
most
> >
> > vulnerable.
> >
> > The difference between the two disk interfaces is whether they deal with
the
> >
> > buffer cache. When accessing the block device interface, the UNIX kernel
> >
> > follows the same algorithm as for regular files, except that after
> > converting
> >
> > the logical byte offset into a logical block offset, it treats the
logical
> >
> > block offset as a physical block number in the file system. It then
accesses
> >
> > the data via the buffer cache and, ultimately, the driver strategy
> > interface.
> >
> > However, when accessing the disk via the raw interface, the kernel does
not
> >
> > convert the byte offset into the file but passes the offset immediately
to
> > the
> >
> > driver. The driver's read or write routine converts the byte offset to a
> >
> > block offset and copies the data directly to the user address space,
> > bypassing
> >
> > kernel buffers.
> >
> > Thus, if one process writes a block device and a second process then
reads a
> >
> > raw device at the same address, the second process may not read the data
> > that
> >
> > the first process had written, because the data may still be in the
buffer
> >
> > cache and not on disk. However, if the second process had read the block
> >
> > device, it would automatically pick up the new data, as it exists in the
> >
> > buffer cache. (Bach, 1990 328).
> >
> > Use of the raw interface may also introduce strange behavior. If a
process
> >
> > reads or writes a raw device in units smaller than the block size,
results
> > are
> >
> > driver-dependent. For instance, when issuing 1-byte writes to a tape
drive,
> >
> > each byte may appear in different tape blocks. (Bach 1990)
> >
> > The advantage of using the raw interface is speed, assuming there is no
> >
> > advantage to caching data for later access. Processes accessing block
> > devices
> >
> > transfer blocks of data whose size are constrained by the file system
> > logical
> >
> > block size. Furthermore, use of the block interface entails an extra
copy of
> >
> > data between user address space and kernel buffers, which is avoided in
the
> >
> > raw interface. For example, if a file system has a logical block size 1K
> >
> > bytes, at most 1K bytes are transferred per I/O operation. However,
> > processes
> >
> > accessing the disk as a raw device can transfer many disk blocks during
a
> > disk
> >
> > operation, subject to the capabilities of the disk controller.
> >
> > Disk controllers are hardware devices that control the I/O actions of
one or
> >
> > more disks. These controllers can also create a bottleneck in a system.
> >
> > (Corey, Abbey, Dechichio 1995). Controllers are the most frequent piece
of
> >
> > hardware to have and cause problems on many systems. When a system has
> >
> > multiple disks controlled by one controller, the results can be fatal.
The
> >
> > bottleneck on controllers is a common cause of write error.
> >
> > It is important to remember that Oracle and other products use these
device
> >
> > access methods to perform their work. It is also important to note the
added
> >
> > complexity that the Oracle kernel adds to the I/O game.
> >
> > The Oracle Relational Database Management System (RDBMS) keeps its
> >
> > information, including data, in block format. However, the Oracle data
block
> >
> > can be, and in most cases is, composed of several operating system
blocks.
> >
> > An Oracle database block is the physical unit of storage in which all
Oracle
> >
> > database data are stored in files. The Oracle database block size is
> >
> > determined by setting a parameter called db_block_size when the
> >
> > database is created. (Millsap, 1995).
> >
> > The most common UNIX block is 512 bytes but the Oracle block size can
range
> >
> > from 512 to 32K. The difference in block sizing between the operating
system
> >
> > and the Oracle kernel are beneficial for Oracle; boosting performance
gains
> >
> > while allowing UNIX to maintain small files with minimal wasted space.
The
> >
> > Oracle block can be considered a superset of the UNIX file system block
> > size.
> >
> > Each block of an Oracle data file is formatted with a fixed header that
> >
> > contains information about the particular block. This information
provides a
> >
> > means to ensure the integrity for each block and in turn, the entire
Oracle
> >
> > database. One component of the fixed header of a data block is called a
Data
> >
> > Block Address (DBA). This DBA is a 48 bit integer that stores the file
> > number
> >
> > of the Oracle database and the Oracle block number offset relative to
the
> >
> > beginning of the file. (Presley, 1993).
> >
> > Whenever there is a problem with the DBA, Oracle will signal an Oracle
error
> >
> > ORA-00600[3339][arg1][arg2] and possibly ORA-01578: Data block corrupted
in
> >
> > file # block #. These errors provide information that point to where the
> >
> > corruption exists and can provide several potential causes. (Presley,
1993).
> >
> > The ORA-00600[3339] has two arguments the are meaningful to the person
> >
> > evaluating the corruption. Argument 1 is the DBA that Oracle found in
the
> > data
> >
> > block just read from disk. Argument 2 is the DBA Oracle expected to find
in
> >
> > the data block it just read. If they are different, the ORA-00600[3339]
is
> >
> > signaled.
> >
> > Oracle uses the standard C system function calls to read and write
blocks to
> >
> > its database files. Once the block has been read it is mapped to shared
> >
> > memory by the operating system, After the block has been read into
shared
> >
> > memory, the Oracle kernel does verification checks on the block to
ensure
> > the
> >
> > integrity of the fixed header. The DBA check is the first verification
made
> >
> > on the fixed header. So why do DBAs become corrupt and how can we
identify
> >
> > and correct them?
> >
> > Case One
> >
> > --------
> >
> > The first case of block corruption occurs when the first argument of the
> >
> > ORA-00600[3339] error has the value of zero while argument two contains
the
> > DBA
> >
> > which Oracle was trying to retrieve. Remember that argument 1 is the DBA
> > just
> >
> > read from disk. Usually the first operating system block of an Oracle
block
> >
> > is zeroed out when there was a soft error on disk and the operating
system
> >
> > attempted to repair its block. In addition, disk repair utility programs
> > have
> >
> > caused this zeroing out effect.
> >
> > Programs that read from and write to the disk directly can destroy the
> >
> > consistency of file system data. The file system algorithms coordinate
disk
> >
> > I/O operation to maintain a consistent view of disk data structures,
> > including
> >
> > linked lists of free disk blocks and pointer from inodes to direct and
> >
> > indirect data blocks. Processes that access the disk directly bypass
these
> > if
> >
> > they run while other file system activity is going on. For this reason,
> > these
> >
> > programs should not be run on an active file system. (Bach, 1990 328).
> >
> > On some early versions of the Oracle RDBMS, a software bug specific to
UNIX
> >
> > platforms also caused the ORA-00600[3339]. This bug was part of the code
> > that
> >
> > dealt with multiple database writers.
> >
> > A database writer is a background process that is responsible for
managing
> > the
> >
> > contents of the data block buffer cache and the dictionary cache. It
reads
> >
> > the blocks from the datafiles and stores them in the Shared Global Area
> >
> > (SGA). The database writer also performs batch writes of changed blocks
back
> >
> > to the datafiles. (Loney, 1994 23). The SGA is a segment of memory
allocated
> >
> > to Oracle the contains data and control information particular to an
Oracle
> >
> > database instance.
> >
> > Using multiple database writers causes multiple background processes to
> >
> > perform disk operations at the same time. However, if there are process
> >
> > conflicts, incorrect values could be stored and corruption can occur.
Also,
> >
> > using multiple database writers with asynchronous I/O has been know to
cause
> >
> > similar results.
> >
> > Asynchronous I/O allows a process to execute a system call to start and
I/O
> >
> > operation and have the system call return immediately after the
operation is
> >
> > started or queued. Another system call is required to wait for the
operation
> >
> > to complete. The advantage of asynchronous I/O is that a process can
overlap
> >
> > its execution with I/O, or it can overlap I/O between different devices.
> >
> > (Stevens, 1990 163).
> >
> > Case Two
> >
> > --------
> >
> > In the second case, both arguments returned with the ORA-00600[3339]
error
> > are
> >
> > large numbers. There are several causes that signal this error.
> >
> > The DBA in the physical block on disk is incorrect. This can happen if
the
> >
> > block was corrupted in memory but was written to disk. This situation is
> >
> > quite rare and in most cases it is usually caused by memory faults that
go
> >
> > undetected. The DBA found in the block is usually garbage and not a
valid
> >
> > DBA. Argument two that is returned with the error is always a valid DBA.
> >
> > If there is a possibility of memory problems on the system, the database
> >
> > administrator can enable further sanity block checking by placing the
> >
> > following event parameters in the database instance init.ora parameter
file:
> >
> > event = "10210 trace name context forever, level 10"
> >
> > event = "10211 trace name context forever, level 10"
> >
> > _db_block_cache_protect= true
> >
> > These parameters force the Oracle RDBMS kernel to call functions that
check
> >
> > the block header and block trailer to ensure that the block is in the
proper
> >
> > format and has not been corrupted. The 10210 event parameter validates
data
> >
> > blocks for tables while the 10211 validates data blocks for indexes. The
> >
> > _db_block_cache_protect=true protects the cache layer from becoming
> > corrupted.
> >
> > This parameter will prevent certain corruption from getting to disk,
> > although
> >
> > it may crash the foreground of the database instance. It will help catch
> >
> > stray writes in the cache. When a process tries to write past the buffer
> > size
> >
> > in the SGA, it will fail first with a stack violation.
> >
> > If the database writer process detects a corrupted block in cache prior
to
> >
> > writing the block to disk, it will signal an ORA-00600[3398] and will
crash
> > the
> >
> > database instance. The block that is corrupted is never written to disk.
> >
> > Various arguments including the DBA are passed to the ORA-00600[3398]
and
> > after
> >
> > receiving such an error, simply attempt to restart the database
instance.
> >
> > There is no doubt that this can be a costly workaround to avoid block
> >
> > corruptions. However, the workaround once a corruption has occurred can
be
> >
> > even costlier.
> >
> > Blocks are sometimes written into the wrong places in the data file.
This is
> >
> > called "write blocks out of sequence." In this case, both DBAs returned
with
> >
> > the ORA-00600[3339] are valid. This typically happens when the operating
> > system
> >
> > I/O device driver fails to write the block in the proper location that
> > Oracle
> >
> > requested via the lseek() system call.
> >
> > Some hardware and operating system vendors supports large files or
"large
> > file
> >
> > systems" that maintain files larger that 4.2 gigabytes. This is larger
than
> >
> > what can be represented by a 32 bit unsigned integer. Therefore, the
> >
> > operating system must translate the offset transparent to the Oracle
kernel.
> >
> > Oracle does not support files larger than 2 gigabytes even though the
> >
> > operating system might. On large file systems, the configuration is such
> > that
> >
> > even smaller Oracle data files suffer corruptions caused by blocks being
> >
> > written out of sequence because the lseek() system call did not
translate
> > the
> >
> > correct location. (Velpuri, 1995).
> >
> > The lseek() system call is one of the most important calls related to
block
> >
> > corruption. The calculations that lseek() performs are often the cause
of
> >
> > block problems. To understand lseek() a brief discussion of byte
positioning
> >
> > is necessary.
> >
> > Every open file has a "current byte position" associated with it. This
is
> >
> > measured as the number of bytes from the start of the file. The create
> > system
> >
> > call sets the file's position to the beginning of the file, as does the
open
> >
> > system call. The read and write system calls update the file's position
by
> >
> > the number of bytes read or written. Before a read or write, an open
file
> > can
> >
> > be positioned using lseek(). The format is:
> >
> > lseek(int fildes, long offset, int whence);
> >
> > The offset and whence arguments are interpreted as follows: If whence is
0,
> >
> > the file's position is set to offset bytes from the beginning of the
file.
> > If
> >
> > whence is 1, the file's position is set to its current position plus the
> >
> > offset. If whence is 2, the file's position is set to the size of the
file
> >
> > plus the offset. The file's offset can be greater than the file's
current
> >
> > size, in which case the next write to the file will extend the file.
Lseek()
> >
> > returns a long integer byte offset of the file. (Stevens, 1990 40).
> >
> > There is great opportunity for miscalculation of an offset based on the
> >
> > lseek() system call. Though lseek is not the only system call culprit in
the
> >
> > block corruption problem, it is a major contributor.
> >
> > Case 3
> >
> > ------
> >
> > A third cause for block corruption is the requested I/O not being
serviced
> > by
> >
> > the operating system. In this case, both arguments returned from the
> >
> > ORA-00600[3339] are valid but the DBA found in argument one is from the
> > previous
> >
> > block read into shared memory prior to the current read request. The
calls
> >
> > that Oracle makes to lseek() and read() are checked for return error
codes.
> >
> > In addition, Oracle checks to see the number of bytes read in by the
read()
> >
> > system call to ensure that the block size or a multiple of the block
size
> > was
> >
> > read. Since these checks appeared to have been successful, Oracle
assumes
> >
> > that the direct read succeeded. Upon sanity checking, the DBA is
incorrect
> >
> > and the database operation request fails. Therefore, the I/O read
request
> >
> > really never took place. In this case, the DBA found can point to a
block of
> >
> > a different file.
> >
> > Case 4
> >
> > ------
> >
> > Another reason for block corruption is reading the wrong block from the
same
> >
> > device. Typically, this is caused by a very busy disk. In some cases,
the
> >
> > block read was off by 1 block but can range into several hundreds of
blocks.
> >
> > The DBAs returned with the ORA-00600[3339] are valid DBA's but are not
the
> > block
> >
> > requested. Since this occurs when the disk is very busy and under lots
of
> >
> > stress, try spreading datafiles across multiple disks and ensure that
the
> > disk
> >
> > drive can support the load.
> >
> >
> >
> > In the third and fourth situations, the database files will not be
> > physically
> >
> > corrupted and the operation can be tried again with success. Most
> > diagnostics
> >
> > testing will not reveal anything wrong with either the operating system
or
> > the
> >
> > hardware. However, the problem is due to operating system or hardware
> > related
> >
> > problems. (Velpuri, 1995).
> >
> > So what causes the operating system calls to behave the way they do and
how
> >
> > can companies try to minimize their risk? To evaluate these questions,
> >
> > another look into how UNIX works is required.
> >
> > UNIX vendors, in a attempt to speed performance, have implemented many
> >
> > features into the filesystem. The filesystem manages a large cache of
I/O
> >
> > buffers, called the buffer cache. This cache allows UNIX to optimize
read
> > and
> >
> > write operations. When a program writes data, the filesystem stores the
data
> >
> > in a buffer rather that writing it to disk immediately. At some later
point
> >
> > in time, the system will send this data to the disk driver, together
with
> >
> > other data that has accumulated in the cache. In other words, the buffer
> >
> > cache lets the disk driver schedule disk operations in batches. It can
make
> >
> > larger transfers and use techniques such as seek optimization to make
disk
> >
> > access more efficient. This is called write-behind.
> >
> > When a program reads data, the system first checks the buffer cache to
see
> > if
> >
> > the desired data is already there. If the data is already in the buffer
> >
> > cache, the filesystem does not need to access the disk for those blocks.
It
> >
> > just gives the user the data it found in its buffer, eliminating the
need to
> >
> > wait for a disk drive. The filesystem only needs to read the disk if the
> > data
> >
> > isn't already in the cache. To increase efficiency even further, the
> >
> > filesystem assumes the program will read the file consecutively and read
> >
> > several blocks from the disk at once. This increases the likelihood that
the
> >
> > data for future read operations will already be in the cache. (Loukides,
M.,
> >
> > 1990) This also increases the chance of block corruption.
> >
> > As a filesystem gets busy and buffers are being read, modified, written,
and
> >
> > aged out of the cache the chance of the kernel reading or writing the
wrong
> >
> > block increases. Also, the more complex the scheme to read from and
write to
> >
> > disk, the greater the likelihood of function failure.
> >
> > The UNIX kernel uses the strategy interface to transmit data between the
> >
> > buffer cache and a device, although the read and write procedures of
> > character
> >
> > devices sometime use their block counterpart strategy procedure to
transfer
> >
> > data directly between the device and the user address space. The
strategy
> >
> > procedure may queue I/O jobs for a device on a work list or do more
> >
> > sophisticated processing to schedule I/O jobs. Drivers can set up data
> >
> > transmission for one physical address or many, as appropriate. The UNIX
> >
> > kernel passes a buffer header address to the driver strategy procedure.
The
> >
> > header contains a list of addresses and sizes for transmission of data
to or
> >
> > from the device. This is also how the swapping operations work. For the
> >
> > buffer cache, the kernel transmits data from one address; when swapping,
the
> >
> > kernel transmits data from many data addresses. If data is being copied
to
> > or
> >
> > from the user's address space, the driver must lock the process in
memory
> >
> > until the I/O transfer is complete.
> >
> > The kernel loses control over a buffer only when it waits for the
completion
> >
> > of I/O between the buffer and the disk. It is conceivable that a disk
drive
> >
> > is corrupt so that it cannot interrupt the CPU, preventing the kernel
from
> >
> > ever releasing the buffer. There are processes that monitor the hardware
for
> >
> > such cases and zero out the block and return an error to the kernel for
a
> > bad
> >
> > disk job. (Bach, 1990 52).
> >
> > On the UNIX level there are several utilities that will check for bad
disk
> >
> > blocks and zero out any blocks they find corrupted. These utilities do
not
> >
> > realize that the block in question may be an Oracle RDBMS block and zero
out
> >
> > the block by mistake.
> >
> > In (Silberschatz, Galvin, 1994), the authors consider the possible
effect of
> > a
> >
> > computer crash. In this case, the table of opened files is generally
lost,
> >
> > and with it any changes in the directories of opened files. This event
can
> >
> > leave the file system in an inconsistent structure. Frequently, a
special
> >
> > program is run at reboot time to check for and correct disk
inconsistencies.
> >
> > The consistency checker compares the data in the directory structure
with
> > the
> >
> > data blocks on disk, and tries to fix and inconsistencies it finds.
> >
> > (Silberschatz, Galvin, 1994) This will often result in the reformatting
of
> >
> > blocks which will cause the Oracle block information to be removed. This
> > will
> >
> > definitely cause Oracle corruption.
> >
> > It is important to realize that monitoring of hardware is required for
all
> >
> > operating systems. Hardware monitors can sense electrical signals on the
> >
> > busses and can accurately record them even at high speed. A hardware
monitor
> >
> > keeps observing the system even when it is malfunctioning, and thus, it
can
> > be
> >
> > used to debug the system. (Jain, 1991 99) These tools can help determine
the
> >
> > cause of the problem and detect problems like controller error and media
> >
> > faulting which are frequent corruption contributors.
> >
> > In any case, there are many opportunities for blocks, either on disk or
in
> > the
> >
> > buffer cache, to become corrupt. Fixing the corruption can sometimes
provide
> >
> > even greater opportunities.
> >
> >
> >
> > Conclusion
> >
> > ----------
> >
> > Data block corruption is an ongoing problem on all operating systems,
> >
> > especially UNIX. There are many types and causes of corruptions to
consider.
> >
> > Advanced system configurations can increase the chance and hardware
problems
> >
> > are a common source of corruptions. When receiving block corruption
errors,
> >
> > remember that a couple of them are not physical corruptions but memory
> >
> > corruptions that are never written to disk.
> >
> > Oracle Customer Support provides a number of bulletins on block
corruption
> >
> > problems that help recover what is left of the data once corruption has
> >
> > occurred. If block corruption occurs on a machine, be sure to identify
the
> >
> > type of corruption and establish a plan for its correction.
> >
> > "Pascal Byrne" <byrne_at_icada.com> wrote in message
> > news:b0mb02$qmmgv$1_at_ID-147008.news.dfncis.de...
> >
> >>Hi,
> >>One of our customers got this error on his production database
> >>which is Oracle 8.0.5.1 on SuSE Linux 7.1 (kerne2.4.16-4GB)
> >>using non-raid removable SCSI disks. Database updates are
> >>normally by JDBC using the Oracle thin driver and Sun JDK 1.1.8
> >>
> >>The error message was:
> >>ORA-00604: error occurred at recursive SQL level 1
> >>ORA-01578: ORACLE data block corrupted (file # 2, block # 8596)
> >>ORA-01110: data file 2: '/ora00/oradata/RSM/rbs01.dbf'
> >>
> >>He was lucky the problem happened in a rollback segment tablespace
> >>rather than one with data since the database is *not* running in
> >>archivelog mode (don't ask) and they don't make cold datafile
> >>backups (don't ask)!
> >>
> >>I checked the disk for bad blocks with 'badblocks' and it came out
> >>clean. The machine is in a climate controlled environment so (FAB)
> >>issues like humidity and temperature don't really come into play.
> >>
> >>My boss wants to know the reason why this happend and how *I* can
> >>prevent it from happening again. Any help would be gratfully accepted.
> >>
> >>Thanks,
> >>Pascal
> >>
> >
> >
> >
>
Received on Fri Jan 24 2003 - 06:30:47 CST