Re: ORA-01578: Datablock corruption

From: Pascal Byrne <byrne_at_icada.com>
Date: Fri, 24 Jan 2003 09:30:14 +0100
Message-ID: <b0qtel$s3l0l$1@ID-147008.news.dfncis.de>

This is an interesting article. The trace events outlined may prove useful.

Thanks,
pascal

koert54 wrote:
> From metalink :
>
>
>
> PURPOSE
>
> This article discusses block corruptions in Oracle and how they are related
>
> to the underlying operating system and hardware. To better illustrate the
>
> discussion, Unix is taken as the operating system of reference.
>
> SCOPE & APPLICATION
>
> For users requiring further understanding as to how a block could become
>
> corrupted.
>
>
>
> Block corruption has been a common occurrence on most UNIX based systems and
>
> relational databases for many years. It is one of the most frequent ways to
>
> lose data and cause serious business impact. Through a survey of literary
>
> technical sources, this document will discuss several ways that block
>
> corruptions can occur, provide conclusions and possible solutions.
>
> To fully comprehend all the reasons for block corruption's, it is necessary
> to
>
> understand how I/O device subsystems work, how memory buffers are used to
>
> support the reading and writing of data blocks, how blocks are sized on both
>
> UNIX and Oracle, and how these three objects work together to maintain data
>
> consistency.
>
> I/O devices are designed specifically for host machines and there have been
>
> few attempts to standardize a particular interface across the industry. Most
>
> software, including Oracle, on UNIX machines uses standard C program calls
> that
>
> in turn perform system calls to support the reading and writing of data to
>
> disk. These system calls access I/O device software that retrieves or writes
>
> data on disk.
>
> The UNIX system contains two types of devices, block devices and raw or
>
> character devices. Block devices look like random access storage devices to
>
> the rest of the system while character devices include all other devices
> such
>
> as terminals and network media. (Bach, 1990 314). These device types are
>
> important to understand because different combinations can increase
> corruptions.
>
> Device drivers are configured by the operating system and the configuration
>
> procedure generates or populates tables that form part of the code of the
>
> kernel. This kernel to device driver interface is described by the block
>
> device switch table and the character device switch table. Each device type
>
> has entries in these tables that direct the kernel to the appropriate driver
>
> interfaces for the system calls. The open and close system calls of a device
>
> file funnel through the two device switch tables, according to file type.
> The
>
> mount and umount system calls also invoke the device open and close
> procedures
>
> for block devices. Read and write system calls of character special files
> pass
>
> through the respective procedures in the character device switch tables.
> Read
>
> and write system calls of block devices and of files on mounted file systems
>
> invokes the algorithms of the buffer cache, which invoke the device strategy
>
> procedure. (Bach, 1990 314). This buffer cache plays an important role in
>
> block corruptions since it is the location where data blocks are the most
>
> vulnerable.
>
> The difference between the two disk interfaces is whether they deal with the
>
> buffer cache. When accessing the block device interface, the UNIX kernel
>
> follows the same algorithm as for regular files, except that after
> converting
>
> the logical byte offset into a logical block offset, it treats the logical
>
> block offset as a physical block number in the file system. It then accesses
>
> the data via the buffer cache and, ultimately, the driver strategy
> interface.
>
> However, when accessing the disk via the raw interface, the kernel does not
>
> convert the byte offset into the file but passes the offset immediately to
> the
>
> driver. The driver's read or write routine converts the byte offset to a
>
> block offset and copies the data directly to the user address space,
> bypassing
>
> kernel buffers.
>
> Thus, if one process writes a block device and a second process then reads a
>
> raw device at the same address, the second process may not read the data
> that
>
> the first process had written, because the data may still be in the buffer
>
> cache and not on disk. However, if the second process had read the block
>
> device, it would automatically pick up the new data, as it exists in the
>
> buffer cache. (Bach, 1990 328).
>
> Use of the raw interface may also introduce strange behavior. If a process
>
> reads or writes a raw device in units smaller than the block size, results
> are
>
> driver-dependent. For instance, when issuing 1-byte writes to a tape drive,
>
> each byte may appear in different tape blocks. (Bach 1990)
>
> The advantage of using the raw interface is speed, assuming there is no
>
> advantage to caching data for later access. Processes accessing block
> devices
>
> transfer blocks of data whose size are constrained by the file system
> logical
>
> block size. Furthermore, use of the block interface entails an extra copy of
>
> data between user address space and kernel buffers, which is avoided in the
>
> raw interface. For example, if a file system has a logical block size 1K
>
> bytes, at most 1K bytes are transferred per I/O operation. However,
> processes
>
> accessing the disk as a raw device can transfer many disk blocks during a
> disk
>
> operation, subject to the capabilities of the disk controller.
>
> Disk controllers are hardware devices that control the I/O actions of one or
>
> more disks. These controllers can also create a bottleneck in a system.
>
> (Corey, Abbey, Dechichio 1995). Controllers are the most frequent piece of
>
> hardware to have and cause problems on many systems. When a system has
>
> multiple disks controlled by one controller, the results can be fatal. The
>
> bottleneck on controllers is a common cause of write error.
>
> It is important to remember that Oracle and other products use these device
>
> access methods to perform their work. It is also important to note the added
>
> complexity that the Oracle kernel adds to the I/O game.
>
> The Oracle Relational Database Management System (RDBMS) keeps its
>
> information, including data, in block format. However, the Oracle data block
>
> can be, and in most cases is, composed of several operating system blocks.
>
> An Oracle database block is the physical unit of storage in which all Oracle
>
> database data are stored in files. The Oracle database block size is
>
> determined by setting a parameter called db_block_size when the
>
> database is created. (Millsap, 1995).
>
> The most common UNIX block is 512 bytes but the Oracle block size can range
>
> from 512 to 32K. The difference in block sizing between the operating system
>
> and the Oracle kernel are beneficial for Oracle; boosting performance gains
>
> while allowing UNIX to maintain small files with minimal wasted space. The
>
> Oracle block can be considered a superset of the UNIX file system block
> size.
>
> Each block of an Oracle data file is formatted with a fixed header that
>
> contains information about the particular block. This information provides a
>
> means to ensure the integrity for each block and in turn, the entire Oracle
>
> database. One component of the fixed header of a data block is called a Data
>
> Block Address (DBA). This DBA is a 48 bit integer that stores the file
> number
>
> of the Oracle database and the Oracle block number offset relative to the
>
> beginning of the file. (Presley, 1993).
>
> Whenever there is a problem with the DBA, Oracle will signal an Oracle error
>
> ORA-00600[3339][arg1][arg2] and possibly ORA-01578: Data block corrupted in
>
> file # block #. These errors provide information that point to where the
>
> corruption exists and can provide several potential causes. (Presley, 1993).
>
> The ORA-00600[3339] has two arguments the are meaningful to the person
>
> evaluating the corruption. Argument 1 is the DBA that Oracle found in the
> data
>
> block just read from disk. Argument 2 is the DBA Oracle expected to find in
>
> the data block it just read. If they are different, the ORA-00600[3339] is
>
> signaled.
>
> Oracle uses the standard C system function calls to read and write blocks to
>
> its database files. Once the block has been read it is mapped to shared
>
> memory by the operating system, After the block has been read into shared
>
> memory, the Oracle kernel does verification checks on the block to ensure
> the
>
> integrity of the fixed header. The DBA check is the first verification made
>
> on the fixed header. So why do DBAs become corrupt and how can we identify
>
> and correct them?
>
> Case One
>
> --------
>
> The first case of block corruption occurs when the first argument of the
>
> ORA-00600[3339] error has the value of zero while argument two contains the
> DBA
>
> which Oracle was trying to retrieve. Remember that argument 1 is the DBA
> just
>
> read from disk. Usually the first operating system block of an Oracle block
>
> is zeroed out when there was a soft error on disk and the operating system
>
> attempted to repair its block. In addition, disk repair utility programs
> have
>
> caused this zeroing out effect.
>
> Programs that read from and write to the disk directly can destroy the
>
> consistency of file system data. The file system algorithms coordinate disk
>
> I/O operation to maintain a consistent view of disk data structures,
> including
>
> linked lists of free disk blocks and pointer from inodes to direct and
>
> indirect data blocks. Processes that access the disk directly bypass these
> if
>
> they run while other file system activity is going on. For this reason,
> these
>
> programs should not be run on an active file system. (Bach, 1990 328).
>
> On some early versions of the Oracle RDBMS, a software bug specific to UNIX
>
> platforms also caused the ORA-00600[3339]. This bug was part of the code
> that
>
> dealt with multiple database writers.
>
> A database writer is a background process that is responsible for managing
> the
>
> contents of the data block buffer cache and the dictionary cache. It reads
>
> the blocks from the datafiles and stores them in the Shared Global Area
>
> (SGA). The database writer also performs batch writes of changed blocks back
>
> to the datafiles. (Loney, 1994 23). The SGA is a segment of memory allocated
>
> to Oracle the contains data and control information particular to an Oracle
>
> database instance.
>
> Using multiple database writers causes multiple background processes to
>
> perform disk operations at the same time. However, if there are process
>
> conflicts, incorrect values could be stored and corruption can occur. Also,
>
> using multiple database writers with asynchronous I/O has been know to cause
>
> similar results.
>
> Asynchronous I/O allows a process to execute a system call to start and I/O
>
> operation and have the system call return immediately after the operation is
>
> started or queued. Another system call is required to wait for the operation
>
> to complete. The advantage of asynchronous I/O is that a process can overlap
>
> its execution with I/O, or it can overlap I/O between different devices.
>
> (Stevens, 1990 163).
>
> Case Two
>
> --------
>
> In the second case, both arguments returned with the ORA-00600[3339] error
> are
>
> large numbers. There are several causes that signal this error.
>
> The DBA in the physical block on disk is incorrect. This can happen if the
>
> block was corrupted in memory but was written to disk. This situation is
>
> quite rare and in most cases it is usually caused by memory faults that go
>
> undetected. The DBA found in the block is usually garbage and not a valid
>
> DBA. Argument two that is returned with the error is always a valid DBA.
>
> If there is a possibility of memory problems on the system, the database
>
> administrator can enable further sanity block checking by placing the
>
> following event parameters in the database instance init.ora parameter file:
>
> event = "10210 trace name context forever, level 10"
>
> event = "10211 trace name context forever, level 10"
>
> _db_block_cache_protect= true
>
> These parameters force the Oracle RDBMS kernel to call functions that check
>
> the block header and block trailer to ensure that the block is in the proper
>
> format and has not been corrupted. The 10210 event parameter validates data
>
> blocks for tables while the 10211 validates data blocks for indexes. The
>
> _db_block_cache_protect=true protects the cache layer from becoming
> corrupted.
>
> This parameter will prevent certain corruption from getting to disk,
> although
>
> it may crash the foreground of the database instance. It will help catch
>
> stray writes in the cache. When a process tries to write past the buffer
> size
>
> in the SGA, it will fail first with a stack violation.
>
> If the database writer process detects a corrupted block in cache prior to
>
> writing the block to disk, it will signal an ORA-00600[3398] and will crash
> the
>
> database instance. The block that is corrupted is never written to disk.
>
> Various arguments including the DBA are passed to the ORA-00600[3398] and
> after
>
> receiving such an error, simply attempt to restart the database instance.
>
> There is no doubt that this can be a costly workaround to avoid block
>
> corruptions. However, the workaround once a corruption has occurred can be
>
> even costlier.
>
> Blocks are sometimes written into the wrong places in the data file. This is
>
> called "write blocks out of sequence." In this case, both DBAs returned with
>
> the ORA-00600[3339] are valid. This typically happens when the operating
> system
>
> I/O device driver fails to write the block in the proper location that
> Oracle
>
> requested via the lseek() system call.
>
> Some hardware and operating system vendors supports large files or "large
> file
>
> systems" that maintain files larger that 4.2 gigabytes. This is larger than
>
> what can be represented by a 32 bit unsigned integer. Therefore, the
>
> operating system must translate the offset transparent to the Oracle kernel.
>
> Oracle does not support files larger than 2 gigabytes even though the
>
> operating system might. On large file systems, the configuration is such
> that
>
> even smaller Oracle data files suffer corruptions caused by blocks being
>
> written out of sequence because the lseek() system call did not translate
> the
>
> correct location. (Velpuri, 1995).
>
> The lseek() system call is one of the most important calls related to block
>
> corruption. The calculations that lseek() performs are often the cause of
>
> block problems. To understand lseek() a brief discussion of byte positioning
>
> is necessary.
>
> Every open file has a "current byte position" associated with it. This is
>
> measured as the number of bytes from the start of the file. The create
> system
>
> call sets the file's position to the beginning of the file, as does the open
>
> system call. The read and write system calls update the file's position by
>
> the number of bytes read or written. Before a read or write, an open file
> can
>
> be positioned using lseek(). The format is:
>
> lseek(int fildes, long offset, int whence);
>
> The offset and whence arguments are interpreted as follows: If whence is 0,
>
> the file's position is set to offset bytes from the beginning of the file.
> If
>
> whence is 1, the file's position is set to its current position plus the
>
> offset. If whence is 2, the file's position is set to the size of the file
>
> plus the offset. The file's offset can be greater than the file's current
>
> size, in which case the next write to the file will extend the file. Lseek()
>
> returns a long integer byte offset of the file. (Stevens, 1990 40).
>
> There is great opportunity for miscalculation of an offset based on the
>
> lseek() system call. Though lseek is not the only system call culprit in the
>
> block corruption problem, it is a major contributor.
>
> Case 3
>
> ------
>
> A third cause for block corruption is the requested I/O not being serviced
> by
>
> the operating system. In this case, both arguments returned from the
>
> ORA-00600[3339] are valid but the DBA found in argument one is from the
> previous
>
> block read into shared memory prior to the current read request. The calls
>
> that Oracle makes to lseek() and read() are checked for return error codes.
>
> In addition, Oracle checks to see the number of bytes read in by the read()
>
> system call to ensure that the block size or a multiple of the block size
> was
>
> read. Since these checks appeared to have been successful, Oracle assumes
>
> that the direct read succeeded. Upon sanity checking, the DBA is incorrect
>
> and the database operation request fails. Therefore, the I/O read request
>
> really never took place. In this case, the DBA found can point to a block of
>
> a different file.
>
> Case 4
>
> ------
>
> Another reason for block corruption is reading the wrong block from the same
>
> device. Typically, this is caused by a very busy disk. In some cases, the
>
> block read was off by 1 block but can range into several hundreds of blocks.
>
> The DBAs returned with the ORA-00600[3339] are valid DBA's but are not the
> block
>
> requested. Since this occurs when the disk is very busy and under lots of
>
> stress, try spreading datafiles across multiple disks and ensure that the
> disk
>
> drive can support the load.
>
>
>
> In the third and fourth situations, the database files will not be
> physically
>
> corrupted and the operation can be tried again with success. Most
> diagnostics
>
> testing will not reveal anything wrong with either the operating system or
> the
>
> hardware. However, the problem is due to operating system or hardware
> related
>
> problems. (Velpuri, 1995).
>
> So what causes the operating system calls to behave the way they do and how
>
> can companies try to minimize their risk? To evaluate these questions,
>
> another look into how UNIX works is required.
>
> UNIX vendors, in a attempt to speed performance, have implemented many
>
> features into the filesystem. The filesystem manages a large cache of I/O
>
> buffers, called the buffer cache. This cache allows UNIX to optimize read
> and
>
> write operations. When a program writes data, the filesystem stores the data
>
> in a buffer rather that writing it to disk immediately. At some later point
>
> in time, the system will send this data to the disk driver, together with
>
> other data that has accumulated in the cache. In other words, the buffer
>
> cache lets the disk driver schedule disk operations in batches. It can make
>
> larger transfers and use techniques such as seek optimization to make disk
>
> access more efficient. This is called write-behind.
>
> When a program reads data, the system first checks the buffer cache to see
> if
>
> the desired data is already there. If the data is already in the buffer
>
> cache, the filesystem does not need to access the disk for those blocks. It
>
> just gives the user the data it found in its buffer, eliminating the need to
>
> wait for a disk drive. The filesystem only needs to read the disk if the
> data
>
> isn't already in the cache. To increase efficiency even further, the
>
> filesystem assumes the program will read the file consecutively and read
>
> several blocks from the disk at once. This increases the likelihood that the
>
> data for future read operations will already be in the cache. (Loukides, M.,
>
> 1990) This also increases the chance of block corruption.
>
> As a filesystem gets busy and buffers are being read, modified, written, and
>
> aged out of the cache the chance of the kernel reading or writing the wrong
>
> block increases. Also, the more complex the scheme to read from and write to
>
> disk, the greater the likelihood of function failure.
>
> The UNIX kernel uses the strategy interface to transmit data between the
>
> buffer cache and a device, although the read and write procedures of
> character
>
> devices sometime use their block counterpart strategy procedure to transfer
>
> data directly between the device and the user address space. The strategy
>
> procedure may queue I/O jobs for a device on a work list or do more
>
> sophisticated processing to schedule I/O jobs. Drivers can set up data
>
> transmission for one physical address or many, as appropriate. The UNIX
>
> kernel passes a buffer header address to the driver strategy procedure. The
>
> header contains a list of addresses and sizes for transmission of data to or
>
> from the device. This is also how the swapping operations work. For the
>
> buffer cache, the kernel transmits data from one address; when swapping, the
>
> kernel transmits data from many data addresses. If data is being copied to
> or
>
> from the user's address space, the driver must lock the process in memory
>
> until the I/O transfer is complete.
>
> The kernel loses control over a buffer only when it waits for the completion
>
> of I/O between the buffer and the disk. It is conceivable that a disk drive
>
> is corrupt so that it cannot interrupt the CPU, preventing the kernel from
>
> ever releasing the buffer. There are processes that monitor the hardware for
>
> such cases and zero out the block and return an error to the kernel for a
> bad
>
> disk job. (Bach, 1990 52).
>
> On the UNIX level there are several utilities that will check for bad disk
>
> blocks and zero out any blocks they find corrupted. These utilities do not
>
> realize that the block in question may be an Oracle RDBMS block and zero out
>
> the block by mistake.
>
> In (Silberschatz, Galvin, 1994), the authors consider the possible effect of
> a
>
> computer crash. In this case, the table of opened files is generally lost,
>
> and with it any changes in the directories of opened files. This event can
>
> leave the file system in an inconsistent structure. Frequently, a special
>
> program is run at reboot time to check for and correct disk inconsistencies.
>
> The consistency checker compares the data in the directory structure with
> the
>
> data blocks on disk, and tries to fix and inconsistencies it finds.
>
> (Silberschatz, Galvin, 1994) This will often result in the reformatting of
>
> blocks which will cause the Oracle block information to be removed. This
> will
>
> definitely cause Oracle corruption.
>
> It is important to realize that monitoring of hardware is required for all
>
> operating systems. Hardware monitors can sense electrical signals on the
>
> busses and can accurately record them even at high speed. A hardware monitor
>
> keeps observing the system even when it is malfunctioning, and thus, it can
> be
>
> used to debug the system. (Jain, 1991 99) These tools can help determine the
>
> cause of the problem and detect problems like controller error and media
>
> faulting which are frequent corruption contributors.
>
> In any case, there are many opportunities for blocks, either on disk or in
> the
>
> buffer cache, to become corrupt. Fixing the corruption can sometimes provide
>
> even greater opportunities.
>
>
>
> Conclusion
>
> ----------
>
> Data block corruption is an ongoing problem on all operating systems,
>
> especially UNIX. There are many types and causes of corruptions to consider.
>
> Advanced system configurations can increase the chance and hardware problems
>
> are a common source of corruptions. When receiving block corruption errors,
>
> remember that a couple of them are not physical corruptions but memory
>
> corruptions that are never written to disk.
>
> Oracle Customer Support provides a number of bulletins on block corruption
>
> problems that help recover what is left of the data once corruption has
>
> occurred. If block corruption occurs on a machine, be sure to identify the
>
> type of corruption and establish a plan for its correction.
>
> "Pascal Byrne" <byrne_at_icada.com> wrote in message
> news:b0mb02$qmmgv$1_at_ID-147008.news.dfncis.de...
>

>>Hi,
>>One of our customers got this error on his production database
>>which is Oracle 8.0.5.1 on SuSE Linux 7.1 (kerne2.4.16-4GB)
>>using non-raid removable SCSI disks. Database updates are
>>normally by JDBC using the Oracle thin driver and Sun JDK 1.1.8
>>
>>The error message was:
>>ORA-00604: error occurred at recursive SQL level 1
>>ORA-01578: ORACLE data block corrupted (file # 2, block # 8596)
>>ORA-01110: data file 2: '/ora00/oradata/RSM/rbs01.dbf'
>>
>>He was lucky the problem happened in a rollback segment tablespace
>>rather than one with data since the database is *not* running in
>>archivelog mode (don't ask) and they don't make cold datafile
>>backups (don't ask)!
>>
>>I checked the disk for bad blocks with 'badblocks' and it came out
>>clean. The machine is in a climate controlled environment so (FAB)
>>issues like humidity and temperature don't really come into play.
>>
>>My boss wants to know the reason why this happend and how *I* can
>>prevent it from happening again. Any help would be gratfully accepted.
>>
>>Thanks,
>>Pascal
>>

>
>
>
Received on Fri Jan 24 2003 - 02:30:14 CST