Possible Oracle Bug, Oracle Claims No

From: Chuck <ccarson_at_echeeba.com>
Date: Tue, 11 Mar 2003 09:52:35 -0800
Message-ID: <3E6E2263.3090402@echeeba.com>

We have the following config: Oracle 8.1.7.4 32-bit running on Solaris 8 64-bit. Our datafile mount points are on hardware RAID 0+1 volume groups of 15 disks each. (IBM FastT 700 RAID controllers). The box is running Veritas Database Edition for Oracle 3.5 MP2 (w/o Quick I/O)

Thus, I have three distinct software layers where I/O errors could be detected and logged. (The RAID controller, the OS, and veritas itself)

We received this error that resulted in a locked datafile:

Thu Mar 6 17:17:10 2003
Errors in file /u01/app/oracle/admin/chem1/bdump/chem1_ckpt_1002.trc:

ORA-01110: data file 24: '/u04/oradata/chem1/reg_index01.dbf'
ORA-01115: IO error reading block from file 24 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect

Additional information: 8192
Thu Mar 6 17:17:10 2003
Errors in file /u01/app/oracle/admin/chem1/bdump/chem1_ckpt_1002.trc:

ORA-01110: data file 24: '/u04/oradata/chem1/reg_index01.dbf'
ORA-01115: IO error reading block from file 24 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect

Additional information: 8192
Thu Mar 6 17:17:10 2003
CKPT: terminating instance due to error 1110 Instance terminated by CKPT, pid = 1002

Here are the contents of chem1_ckpt_1002.trc

Unix process pid: 1002, image: oracle_at_db-0203 (CKPT)

2003-03-06 17:17:10.562
SESSION ID:(7.1) 2003-03-06 17:17:10.544 ORA-01110: data file 24: '/u04/oradata/chem1/reg_index01.dbf' ORA-01115: IO error reading block from file 24 (block # 1) ORA-27063: skgfospo: number of bytes read/written is incorrect Additional information: 8192 error 1110 detected in background process ORA-01110: data file 24: '/u04/oradata/chem1/reg_index01.dbf' ORA-01115: IO error reading block from file 24 (block # 1) ORA-27063: skgfospo: number of bytes read/written is incorrect Additional information: 8192

During the time of the error, we were running an extensive index rebuild process that was using that datafile and a3rd party cartridge called Daylight. (which is extremelly poorly written I might add) The datafile in question is a 2GB datafile, however, when I looked at the file within the OS, it displayed the file size as 400k.

So, we have absolutely no logs at the OS, Veritas, or Raid conrtoller level, we have an error detected in an oracle background process, only one datafile was affected even though there are many on that disk group, AND we just happen to be running a heavy job on the datafile at the time of the error. Oracle claims there was no oracle problem and that there was a hardware I/O problem, like a bad block or something. In my 10+ years of experience I have never seen a raid group produce an I/O due to a bad disk block AND not log anything what-so-ever. Solaris is usually very very good at logging file system inconsistencies.

Just wanted to get feedback from other impartial DBA's.

Thanks for any input,
CC

-----------== Posted via Newsfeed.Com - Uncensored Usenet News ==----------

http://www.newsfeed.com The #1 Newsgroup Service in the World! -----= Over 100,000 Newsgroups - Unlimited Fast Downloads - 19 Servers =----- Received on Tue Mar 11 2003 - 11:52:35 CST