Corrupted data in Oracle7

From: <jsb_at_telerama.lm.com>
Date: 7 Dec 1994 00:45:58 -0500
Message-ID: <3c3i6m$mqg_at_tusk.lm.com>


We have recently completed a port of a fairly large application from Oracle6 to Oracle7. All was well until shortly after we installed the Oracle7 database in production. I first noticed the problem when trying to select count(*) from a table. I received this error:

>> SQL> select * from unit_history;
>> ERROR:
>> ORA-01578: ORACLE data block corrupted (file # 7, block # 37827)
>> ORA-01110: data file 7: '/db1/dynamic2.CIM'
>> ORA-00600: internal error code, arguments: [3339], [0], [469799875], [], [],
>> [], [], []
>>
>> no rows selected
>>
>> SQL>
The file /db1/dynamic2.CIM houses a large tablespace. Shortly after, I tried to get into SQL*Plus and received this error:

>> viper:cim:AM1 sqlplus /
>>
>> SQL*Plus: Release 3.1.3.2.1 - Production on Tue Dec 6 11:02:41 1994
>>
>> Copyright (c) Oracle Corporation 1979, 1992. All rights reserved.
>>
>> ERROR: ORA-00604: error occurred at recursive SQL level 1
>> ORA-01578: ORACLE data block corrupted (file # 13, block # 52537)
>> ORA-01110: data file 13: '/db1/rollbacks1.CIM'
>> ORA-00600: internal error code, arguments: [3339], [0], [872467769], [], [], [],
>> [], []

The only way to access the database was through SQL*DBA. I brought the database down, checked the file system for media errors, all was well. I brought the database back up to the same problem. I called Oracle. Tech support verified the arguments in the ORA-00600 code being consistent with the file number and block 2 lines above. Tech support said that 99% of the time, this means a device error. HP just happened to be onsite, and ran a completely thorough set of diagnostics on the file system. Everything was fine. Must be software, he said. Since we were running the default two task with shadow processes, Oracle tech support said it would be very unlikely that a disgruntled process could corrupt the database.

Later in the day, I talked to a colleague who I respect very much and consider an Oracle guru, who just happened to be complaining about - you guessed it - Internal Oracle errors on HP-UX he's been fighting just like the one above. This got me thinking. Surely, such a problem would have surfaced and become known by now. The Oracle tech support said that he was comfortable with us running 7.0.16 and didn't know of any "major defects" with that release. I took the offending rollback segment offline and dropped and recreated it. There were no commits pending, but when I restarted the database, my original table, in it's own tablespace, mind you, suddenly had no records. Oracle is analyzing redo dumps and is still claiming flakey hardware.

After two days of intense DBA'ing, Unix'ing and talking on phones, I'm convinced that it is *not* a hardware problem. Has anyone else seen similar problems? Often? Any root causes?

Scratching my head...

-- 
-Jeffrey Buck
 Enterprise Technology Group, Inc.
 jsb_at_etgroup.com
Received on Wed Dec 07 1994 - 06:45:58 CET

Original text of this message